CN113342941B - Text search method and device, electronic equipment and computer readable storage medium - Google Patents

Text search method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN113342941B
CN113342941B CN202110719746.8A CN202110719746A CN113342941B CN 113342941 B CN113342941 B CN 113342941B CN 202110719746 A CN202110719746 A CN 202110719746A CN 113342941 B CN113342941 B CN 113342941B
Authority
CN
China
Prior art keywords
text
search
search result
word segmentation
result text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110719746.8A
Other languages
Chinese (zh)
Other versions
CN113342941A (en
Inventor
周峰
刘进
熊英杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Trust Co Ltd
Original Assignee
Ping An Trust Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Trust Co Ltd filed Critical Ping An Trust Co Ltd
Priority to CN202110719746.8A priority Critical patent/CN113342941B/en
Publication of CN113342941A publication Critical patent/CN113342941A/en
Application granted granted Critical
Publication of CN113342941B publication Critical patent/CN113342941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a big data engine technology, and discloses a text search method, which comprises the following steps: performing word segmentation and search operation on a search text to obtain a search result text set comprising text types and text score information, outputting a search result text matched with a preset highest priority type in the search result text set, if no matched search result text exists, calculating an average difference value of the search result text set by using the text score, and performing word segmentation and search operation on the search text again when the average difference value does not meet a preset condition until the average difference value meets the preset condition, and outputting the search result text with the highest text score. The invention also relates to blockchain techniques, the search result text set may be present in blockchain nodes. The invention also provides a text searching device, electronic equipment and a storage medium. The invention can solve the problems that the text search can not be output according to the priority of the text type and the accuracy needs to be improved.

Description

Text search method and device, electronic equipment and computer readable storage medium
Technical Field
The invention relates to the field of big data engines, in particular to a text search method and device, electronic equipment and a computer readable storage medium.
Background
Searching according to the search text through the internet or an enterprise internal system is one of important means for people to acquire information. The current text search mainly comprises the steps of performing keyword matching operation in a document library of an internet or an enterprise internal system according to keywords input by a user to obtain a search result text set, sequencing and outputting each search result text in the search result text set according to the sequence of matching degree, and outputting the matched first search result text by default when the matching degree of the search result texts is the same or similar.
On one hand, the text search method cannot meet the requirement that a user wants to output search results according to priorities of different document types, for example, the user wants to preferentially display the search result text of a title class aiming at the search result texts of the title class, the document content class and the attachment class.
On the other hand, when the matching degrees of the search result texts are the same or similar, the matched first search result text is output by default, and the first search result text may not be the search result desired by the user.
Therefore, the current text search method has the problems that the result output cannot be carried out according to the priority of the text type and the accuracy needs to be improved.
Disclosure of Invention
The invention provides a text search method, a text search device and a computer-readable storage medium, and mainly aims to solve the problems that the current text search method cannot output the text according to the priority of the text type and the accuracy needs to be improved.
In order to achieve the above object, the present invention provides a text search method, including:
receiving a search text, and executing word segmentation operation on the search text to obtain a word segmentation segment set;
executing a search operation according to the word segmentation set to obtain a search result text set and a text type and a text score of each search result text in the search result text set;
judging whether the search result text set comprises a search result text of which the text type is a preset highest priority type or not;
if the search result text set comprises a search result text of which the text type is the preset highest priority type, outputting the search result text of which the text type is the highest priority type;
if the search result text set does not comprise a search result text of which the text type is the preset highest priority type, calculating to obtain an average difference value of the search result text set according to the text score of each search result text in the search result text set;
judging whether the average difference value of the search result text set meets a preset condition or not;
if the average difference value meets the preset condition, outputting a search result text with the highest text score in the search result text set;
if the average difference value does not meet the preset condition, judging whether the search text can be subjected to word segmentation again;
if the search text can be segmented again, performing fine-grained segmentation operation on the search text to obtain an updated segmentation fragment set, and returning to the step of performing the search operation according to the segmentation fragment set;
and if the search text can not be segmented into words, sequencing according to a default search result, and outputting the search result text set.
Optionally, the performing a word segmentation operation on the search text to obtain a word segmentation segment set includes:
distributing a word segmentation granularity set for the search text according to a preset word segmentation granularity table;
obtaining the maximum word segmentation granularity from the word segmentation granularity set;
performing word segmentation operation on the search text according to the maximum word segmentation granularity by using a word segmentation device to obtain a word segmentation segment set;
in the set of participle granularities, marking that the maximum participle granularity is used.
Optionally, the performing a search operation according to the word segmentation set to obtain a search result text set and a text type and a text score of each search result text in the search result text set includes:
sequencing each word segmentation segment in the word segmentation segment set according to the sequence of the word frequency values from high to low, and assembling into a search queue according to the sequence;
sequentially executing fuzzy search operation on the word segmentation segments in the search queue to obtain the search result text set;
scoring each search result text in the search result text set according to the matching degree of each search result text in the search result text set and the search text to obtain a text score of each search result text;
and labeling the text type of each search result text in the search result text set according to a preset text type classification label to obtain the text type of each search result text.
Optionally, the determining whether the average difference value of the search result text set meets a preset condition includes:
judging whether the average difference value is larger than a threshold value specified by the preset condition;
if the average difference value is larger than the threshold value specified by the preset condition, judging that the average difference value of the search result text set meets the preset condition;
and if the average difference is less than or equal to the threshold value specified by the preset condition, judging that the average difference of the search result text set does not meet the preset condition.
Optionally, the determining whether the search text can be re-segmented includes:
judging whether unused word segmentation granularity exists in the word segmentation granularity set;
if the word segmentation granularity set has unused word segmentation granularity, judging that the search text can be re-segmented;
and if the word segmentation granularity set does not have unused word segmentation granularity, judging that the search text can not be segmented again.
Optionally, the performing fine-grained word segmentation on the search text to obtain an updated word segmentation set includes:
obtaining the maximum participle granularity in the unused participle granularity set;
performing word segmentation operation on the search text by using a word segmentation device according to the maximum word segmentation granularity in the unused word segmentation granularities to obtain an updated word segmentation segment set;
in the set of participle granularities, marking that a largest participle granularity among the unused participle granularities is used.
Optionally, the receiving a search text, before performing a word segmentation operation on the search text, further includes:
and carrying out operations of punctuation removal, word deactivation and useless symbol removal on the search text.
In order to solve the above problem, the present invention also provides a text search apparatus, comprising:
the word segmentation and search module is used for receiving a search text and executing word segmentation operation on the search text to obtain a word segmentation fragment set; executing a search operation according to the word segmentation set to obtain a search result text set and a text type and a text score of each search result text in the search result text set;
the text type judging and outputting module is used for judging whether the search result text set comprises a search result text of which the text type is a preset highest priority type; if the search result text set comprises a search result text of which the text type is the preset highest priority type, outputting the search result text of which the text type is the highest priority type;
the text score judging and outputting module is used for calculating to obtain an average difference value of the search result text set according to the text score of each search result text in the search result text set if the search result text set does not comprise the search result text of which the text type is the preset highest priority type; judging whether the average difference value of the search result text set meets a preset condition or not; if the average difference value meets the preset condition, outputting a search result text with the highest text score in the search result text set;
the word segmentation judging and outputting module is used for judging whether the search text can be segmented again or not if the average difference value does not meet the preset condition; if the search text can be segmented again, performing fine-grained segmentation operation on the search text to obtain an updated segmentation fragment set, and returning to the step of performing the search operation according to the segmentation fragment set; and if the search text can not be segmented into words, sequencing according to a default search result, and outputting the search result text set.
In order to solve the above problem, the present invention also provides an electronic device, including:
a memory storing at least one instruction; and
and the processor executes the instructions stored in the memory to realize the text searching method.
In order to solve the above problem, the present invention further provides a computer-readable storage medium, in which at least one instruction is stored, and the at least one instruction is executed by a processor in an electronic device to implement the text search method described above.
The embodiment of the invention executes word segmentation and search operation according to the search text to obtain a search result text set comprising text types and text score information, outputs the search result text matched with a preset highest priority type in the search result text set, if no matched search result text exists, calculates to obtain an average difference value of the search result text set by using the text score information, when the average difference value meets a preset condition, namely the matching degree of each search result text in the search text set is not close, outputs the search text with the highest text score, when the average difference value does not meet the preset condition, namely the matching degree of each search result text is the same or close, executes fine-grained word segmentation and search operation on the search text until the search text can not be segmented again, outputs according to a default search result sequence, or outputting the search text with the highest text score until the search result text matched with the preset highest priority type is output or until the average difference value of the new search result text set meets the preset condition, thereby improving the accuracy of text search. Therefore, the invention can solve the problems that the current text searching method can not output according to the priority of the text type and the accuracy needs to be improved.
Drawings
Fig. 1 is a schematic flowchart of a text search method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a detailed implementation of one step in the text search method shown in FIG. 1;
FIG. 3 is a flowchart illustrating a detailed implementation of one step in the text search method shown in FIG. 1;
FIG. 4 is a flowchart illustrating a detailed implementation of one step in the text search method shown in FIG. 1;
FIG. 5 is a functional block diagram of a text search apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device implementing the text search method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the application provides a text searching method. The execution subject of the text search method includes, but is not limited to, at least one of electronic devices such as a server and a terminal that can be configured to execute the method provided by the embodiments of the present application. In other words, the text search method may be performed by software or hardware installed in the terminal device or the server device, and the software may be a blockchain platform. The server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Fig. 1 is a schematic flow chart of a text search method according to an embodiment of the present invention.
In this embodiment, the text search method includes:
s1, receiving a search text, and performing word segmentation operation on the search text to obtain a word segmentation fragment set;
in the embodiment of the invention, the search text refers to character information to be retrieved, and comprises information such as Chinese characters, numbers, letters, punctuations and the like. For example, through an internet website, a "car manufacturing factory" is searched, wherein the "car manufacturing factory" is a search text.
The word segmentation operation means that the search text is segmented according to information such as a blank space, numbers, characters, an E-mail address, an IP address and the like to obtain a plurality of independent words or phrases, wherein each independent word or phrase is a word segmentation segment of the search text.
In detail, referring to fig. 2, the S1 includes:
s11, distributing a word segmentation granularity set for the search text according to a preset word segmentation granularity table;
s12, obtaining the maximum participle granularity from the participle granularity set;
s13, performing word segmentation operation on the search text according to the maximum word segmentation granularity by using a word segmentation device to obtain a word segmentation fragment set;
s14, marking that the maximum participle granularity is used in the participle granularity set.
In the embodiment of the present invention, the preset word segmentation particle table is used to define different word segmentation particle sizes, and the word segmentation particle size is used to measure the amount of information contained in each word segmentation segment in the word segmentation segment set, and generally, the larger the amount of information contained in a word segmentation segment is, the larger the corresponding word segmentation particle size is.
In the embodiment of the present invention, a participle granularity set containing different participle granularities may be allocated to the search text according to the length of the search text and the complexity of the character type, and the participle granularity may specify 1 character, 2 characters, or 3 characters as a participle segment. For example, if the search text is a "basketball court", word segmentation may be performed on the search text according to a granularity of 2 characters or a granularity of 1 character, so as to obtain several word segmentation segments, i.e., "basketball", a "court" and "blue", "ball" and "court", respectively, so that the word segmentation granularity set of the search text includes two word segmentation granularities, which are a word segmentation granularity of 1 character and a word segmentation granularity of 2 characters, respectively.
S2, executing a search operation according to the word segmentation set to obtain a search result text set and the text type and text score of each search result text in the search result text set;
in the embodiment of the present invention, an es (elastic search) search engine may be used to perform a search operation on the participle segment. The ES (elastic search) search engine is a full-text search engine that provides distributed multi-user capabilities.
In detail, referring to fig. 3, the S2 includes:
s21, sequencing each word segmentation segment in the word segmentation segment set according to the sequence of the word frequency values from high to low, and assembling into a search queue according to the sequence;
s22, sequentially executing fuzzy search operation on the word segmentation segments in the search queue to obtain the search result text set;
s23, scoring each search result text in the search result text set according to the matching degree of each search result text in the search result text set and the search text to obtain the text score of each search result text;
and S24, labeling the text type of each search result text in the search result text set according to the preset text type classification label to obtain the text type of each search result text.
In the embodiment of the invention, the fuzzy search means that a certain difference exists between the searched information and the searched text, and the difference is the meaning of 'fuzzy' in the search. For example, when looking for the name Smith, the searched information may be similar Smith, Smythe, Smyth, Smitt, etc.
In the embodiment of the present invention, the preset text type classification tags generally include three classification tags of a title, a content, and an attachment, where the content is content information in a certain document or web page.
S3, judging whether the search result text set comprises a search result text with a text type of a preset highest priority type;
in the embodiment of the invention, the search result text set is subjected to priority ordering according to text types according to the search habits of users, and the titles are preset to be the highest priority type, the priority of the content is the lowest, and the priority of the attachment is the lowest. In practical application, one of the text types can be set as the highest priority according to practical situations.
S4, if the search result text set comprises a search result text with a text type of the preset highest priority type, outputting the search result text with the highest priority type;
in the embodiment of the invention, if the text type of the search result text is the title, the search result text corresponding to the title is output. In practical application, the search result text set may also be sorted according to a priority order of text types, and the search result text set may be output according to the sorting.
S5, if the search result text set does not include the search result text with the text type being the preset highest priority type, calculating to obtain an average difference value of the search result text set according to the text score of each search result text in the search result text set;
in the embodiment of the present invention, the search result text set may be from different documents and different attachments, where text scores of each search result text may be different, and an average difference value of the search result text set may be calculated according to the text score of each search result text.
S6, judging whether the average difference value of the search result text set meets a preset condition or not;
the preset condition may be to preset a threshold or a threshold interval, for example, 0.5 or 0 to 1.9, according to experimental data, and determine whether the average difference is greater than the threshold, or determine whether the average difference is not within the threshold interval.
In detail, the determining whether the average difference value of the search result text set meets a preset condition includes: judging whether the average difference value is larger than a threshold value specified by the preset condition; if the average difference value is larger than the threshold value specified by the preset condition, the preset condition is met; and if the average difference value is less than or equal to the threshold value specified by the preset condition, the preset condition is not met.
In another embodiment of the present invention, the preset condition may be a specified threshold interval, and if the average difference value is not in the threshold interval specified by the preset condition, the average difference value meets the preset condition, or if the average difference value is in the threshold interval specified by the preset condition, the average difference value does not meet the preset condition.
S7, if the average difference value meets the preset condition, outputting a search result text with the highest text score in the search result text set;
in the embodiment of the invention, if the average difference value meets the preset condition, the difference of the text scores of each search result text in the search result text set is judged to be larger, and the scores are not close.
S8, if the average difference value does not meet the preset condition, judging whether the search text can be subjected to word segmentation again;
in the embodiment of the present invention, if the average difference does not satisfy the preset condition, it is determined that the text score of each search result text in the search result text set is relatively close, and it is necessary to further determine whether the word segmentation segment set can perform word segmentation again.
In detail, if the average difference does not satisfy the preset condition, determining whether the search text can be re-segmented, including: judging whether unused word segmentation granularity exists in the word segmentation granularity set; if the word segmentation granularity set has unused word segmentation granularity, judging that the search text can be re-segmented; and if the word segmentation granularity set does not have unused word segmentation granularity, judging that the search text can not be segmented any more.
In the embodiment of the present invention, in the word segmentation granularity set, there is no unused word segmentation granularity, and it is determined that a word segmentation operation has been performed on the search text according to each word segmentation granularity in the word segmentation granularity set, and the search text does not have an adoptable word segmentation granularity at present, and cannot be word-segmented any more.
S9, if the search text can be segmented again, performing fine-grained segmentation operation on the search text to obtain an updated segmentation fragment set, and returning to S2;
in the embodiment of the present invention, if the average difference does not satisfy the preset condition and the search text can be re-segmented, re-segmentation and retrieval operations on the search text are performed until a search result text matching the preset highest priority type is output, or until the average difference of a new search result text set satisfies the preset condition, a search text with the highest text score is output, or until the search text cannot be re-segmented.
In detail, referring to fig. 4, the S9 includes:
s91, obtaining the largest participle granularity in the unused participle granularities in the participle granularity set;
s92, performing word segmentation operation on the search text by using a word segmentation device according to the maximum word segmentation granularity in the unused word segmentation granularities to obtain an updated word segmentation set;
and S93, marking the used word segmentation granularity of the largest word segmentation granularity in the word segmentation granularity set.
In the embodiment of the invention, the text searching accuracy can be improved by executing the word segmentation and retrieval operation on the search text.
And S10, if the search text can not be divided into words, sorting according to default search results, and outputting the search result text set.
In the embodiment of the present invention, the search result text set may also be selectively output, for example, the search result text sets are sorted from high to low according to the text scores of each search result text in the search result text set, and the search result text set 5 top scores is selected and output.
The method comprises the steps of executing word segmentation and search operation according to a search text to obtain a search result text set comprising text types and text score information, outputting the search result text matched with a preset highest priority type in the search result text set, if no matched search result text exists, calculating by using the text score information to obtain an average difference value of the search result text set, outputting the search text with the highest text score when the average difference value meets a preset condition, namely the matching degree of each search result text in the search text set is not close, executing word segmentation and search operation on the search text again when the average difference value does not meet the preset condition, namely the matching degree of each search result text is the same or close, and outputting according to a default search result sequence until the search text can not be further segmented, or outputting the search text with the highest text score until the search result text matched with the preset highest priority type is output or until the average difference value of the new search result text set meets the preset condition, thereby improving the accuracy of text search.
Fig. 5 is a functional block diagram of a text search apparatus according to an embodiment of the present invention.
The text search apparatus 100 according to the present invention may be installed in an electronic device. According to the implemented functions, the text search 100 may include a word segmentation and search module 101, a text type determination and output module 102, a text score determination and output module 103, and a word segmentation determination and output module 104. The module of the present invention, which may also be referred to as a unit, refers to a series of computer program segments that can be executed by a processor of an electronic device and that can perform a fixed function, and that are stored in a memory of the electronic device.
In the present embodiment, the functions regarding the respective modules/units are as follows:
the word segmentation and search module 101 is configured to receive a search text, and perform word segmentation on the search text to obtain a word segmentation segment set; executing a search operation according to the word segmentation set to obtain a search result text set and a text type and a text score of each search result text in the search result text set;
the text type determining and outputting module 102 is configured to determine whether the search result text set includes a search result text of which a text type is a preset highest priority type; if the search result text set comprises a search result text of which the text type is the preset highest priority type, outputting the search result text of which the text type is the highest priority type;
the text score judging and outputting module 103 is configured to calculate an average difference value of the search result text set according to the text score of each search result text in the search result text set if the search result text set does not include a search result text whose text type is the preset highest priority type; judging whether the average difference value of the search result text set meets a preset condition or not; if the average difference value meets the preset condition, outputting a search result text with the highest text score in the search result text set;
the word segmentation judging and outputting module 104 is configured to judge whether the search text can be segmented again if the average difference does not satisfy the preset condition; if the search text can be segmented again, performing fine-grained segmentation operation on the search text to obtain an updated segmentation fragment set, and returning to the step of performing the search operation according to the segmentation fragment set; and if the search text can not be segmented into words, sequencing according to a default search result, and outputting the search result text set.
In detail, when the modules in the text search apparatus 100 according to the embodiment of the present invention are used, the same technical means as the text search method described in fig. 1 to fig. 4 are adopted, and the same technical effects can be produced, which is not described herein again.
Fig. 6 is a schematic structural diagram of an electronic device implementing a text search method according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as a text search program, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, which includes flash memory, removable hard disk, multimedia card, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, e.g. a removable hard disk of the electronic device 1. The memory 11 may also be an external storage device of the electronic device 1 in other embodiments, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only to store application software installed in the electronic device 1 and various types of data, such as codes of a text search program, etc., but also to temporarily store data that has been output or is to be output.
The processor 10 may be formed of an integrated circuit in some embodiments, for example, a single packaged integrated circuit, or may be formed of a plurality of integrated circuits packaged with the same function or different functions, including one or more Central Processing Units (CPUs), microprocessors, digital Processing chips, graphics processors, and combinations of various control chips. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects various components of the whole electronic device by using various interfaces and lines, and executes various functions and processes data of the electronic device 1 by running or executing programs or modules (e.g., text search programs, etc.) stored in the memory 11 and calling data stored in the memory 11.
The bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The bus is arranged to enable connection communication between the memory 11 and at least one processor 10 or the like.
Fig. 6 only shows an electronic device with components, and it will be understood by a person skilled in the art that the structure shown in fig. 6 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or a combination of certain components, or a different arrangement of components.
For example, although not shown, the electronic device 1 may further include a power supply (such as a battery) for supplying power to each component, and preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, power consumption management, and the like through the power management device. The power supply may also include any component of one or more dc or ac power sources, recharging devices, power failure detection circuitry, power converters or inverters, power status indicators, and the like. The electronic device 1 may further include various sensors, a bluetooth module, a Wi-Fi module, and the like, which are not described herein again.
Further, the electronic device 1 may further include a network interface, and optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a bluetooth interface, etc.), which are generally used for establishing a communication connection between the electronic device 1 and other electronic devices.
Optionally, the electronic device 1 may further comprise a user interface, which may be a Display (Display), an input unit (such as a Keyboard), and optionally a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch device, or the like. The display, which may also be referred to as a display screen or display unit, is suitable for displaying information processed in the electronic device 1 and for displaying a visualized user interface, among other things.
It is to be understood that the described embodiments are for purposes of illustration only and that the scope of the appended claims is not limited to such structures.
The text search program stored in the memory 11 of the electronic device 1 is a combination of instructions, which when executed in the processor 10, can implement:
receiving a search text, and executing word segmentation operation on the search text to obtain a word segmentation segment set;
executing a search operation according to the word segmentation set to obtain a search result text set and a text type and a text score of each search result text in the search result text set;
judging whether the search result text set comprises a search result text of which the text type is a preset highest priority type or not;
if the search result text set comprises a search result text of which the text type is the preset highest priority type, outputting the search result text of which the text type is the highest priority type;
if the search result text set does not comprise a search result text of which the text type is the preset highest priority type, calculating to obtain an average difference value of the search result text set according to the text score of each search result text in the search result text set;
judging whether the average difference value of the search result text set meets a preset condition or not;
if the average difference value meets the preset condition, outputting a search result text with the highest text score in the search result text set;
if the average difference value does not meet the preset condition, judging whether the search text can be subjected to word segmentation again;
if the search text can be segmented again, performing fine-grained segmentation operation on the search text to obtain an updated segmentation fragment set, and returning to the step of performing the search operation according to the segmentation fragment set;
and if the search text can not be segmented into words, sequencing according to a default search result, and outputting the search result text set.
Specifically, the specific implementation method of the processor 10 for the instruction may refer to the description of the relevant steps in the embodiment corresponding to fig. 1, which is not described herein again.
Further, the integrated modules/units of the electronic device 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. The computer readable storage medium may be volatile or non-volatile. For example, the computer-readable medium may include: any entity or device capable of carrying said computer program code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM).
The present invention also provides a computer-readable storage medium, storing a computer program which, when executed by a processor of an electronic device, may implement:
receiving a search text, and performing word segmentation operation on the search text to obtain a word segmentation fragment set;
executing a search operation according to the word segmentation set to obtain a search result text set and a text type and a text score of each search result text in the search result text set;
judging whether the search result text set comprises a search result text of which the text type is a preset highest priority type or not;
if the search result text set comprises a search result text of which the text type is the preset highest priority type, outputting the search result text of which the text type is the highest priority type;
if the search result text set does not comprise the search result text of which the text type is the preset highest priority type, calculating to obtain an average difference value of the search result text set according to the text score of each search result text in the search result text set;
judging whether the average difference value of the search result text set meets a preset condition or not;
if the average difference value meets the preset condition, outputting a search result text with the highest text score in the search result text set;
if the average difference value does not meet the preset condition, judging whether the search text can be subjected to word segmentation again;
if the search text can be segmented again, performing fine-grained segmentation operation on the search text to obtain an updated segmentation fragment set, and returning to the step of performing the search operation according to the segmentation fragment set;
and if the search text can not be segmented into words, sequencing according to a default search result, and outputting the search result text set.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus, device and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof.
The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Furthermore, it will be obvious that the term "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms second, etc. are used to denote names, but not any particular order.
Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims (9)

1. A text search method, the method comprising:
receiving a search text, and executing word segmentation operation on the search text to obtain a word segmentation segment set;
executing a search operation according to the word segmentation set to obtain a search result text set and a text type and a text score of each search result text in the search result text set, wherein the method comprises the following steps: sequencing each word segmentation segment in the word segmentation segment set according to the sequence of the word frequency values from high to low, and assembling into a search queue according to the sequence; sequentially executing fuzzy search operation on the word segmentation segments in the search queue to obtain the search result text set; scoring each search result text in the search result text set according to the matching degree of each search result text in the search result text set and the search text to obtain a text score of each search result text; labeling the text type of each search result text in the search result text set according to a preset text type classification label to obtain the text type of each search result text;
judging whether the search result text set comprises a search result text of which the text type is a preset highest priority type or not;
if the search result text set comprises a search result text of which the text type is the preset highest priority type, outputting the search result text of which the text type is the highest priority type;
if the search result text set does not comprise the search result text of which the text type is the preset highest priority type, calculating to obtain an average difference value of the search result text set according to the text score of each search result text in the search result text set;
judging whether the average difference value of the search result text set meets a preset condition or not;
if the average difference value meets the preset condition, outputting a search result text with the highest text score in the search result text set;
if the average difference value does not meet the preset condition, judging whether the search text can be subjected to word segmentation again;
if the search text can be segmented again, performing fine-grained segmentation operation on the search text to obtain an updated segmentation fragment set, and returning to the step of performing the search operation according to the segmentation fragment set;
and if the search text can not be divided into words, sorting according to default search results, and outputting the search result text set.
2. The text search method of claim 1, wherein said performing a word segmentation operation on said search text to obtain a set of word segmentation segments comprises:
distributing a word segmentation granularity set for the search text according to a preset word segmentation granularity table;
obtaining the maximum word segmentation granularity from the word segmentation granularity set;
performing word segmentation operation on the search text according to the maximum word segmentation granularity by using a word segmentation device to obtain a word segmentation segment set;
in the set of participle granularities, marking that the maximum participle granularity is used.
3. The text search method of claim 1, wherein the determining whether the average difference value of the search result text set satisfies a preset condition comprises:
judging whether the average difference value is larger than a threshold value specified by the preset condition;
if the average difference value is larger than the threshold value specified by the preset condition, judging that the average difference value of the search result text set meets the preset condition;
and if the average difference is less than or equal to the threshold value specified by the preset condition, judging that the average difference of the search result text set does not meet the preset condition.
4. The text search method of claim 2, wherein said determining whether said search text can be re-participled comprises:
judging whether unused word segmentation granularity exists in the word segmentation granularity set;
if the word segmentation granularity set has unused word segmentation granularity, judging that the search text can be re-segmented;
and if the word segmentation granularity set does not have unused word segmentation granularity, judging that the search text can not be segmented any more.
5. The text search method of claim 2, wherein said performing fine-grained word segmentation operations on said search text to obtain an updated set of word-segmented fragments comprises:
obtaining the maximum participle granularity in the unused participle granularity set;
performing word segmentation operation on the search text by using a word segmentation device according to the maximum word segmentation granularity in the unused word segmentation granularities to obtain an updated word segmentation segment set;
in the set of participle granularities, marking that a largest participle granularity among the unused participle granularities is used.
6. The text search method of claim 1, wherein said receiving a search text, prior to performing a word segmentation operation on said search text, further comprises:
and carrying out operations of punctuation removal, word deactivation and useless symbol removal on the search text.
7. An apparatus for text search, the apparatus comprising:
the word segmentation and search module is used for receiving a search text and executing word segmentation operation on the search text to obtain a word segmentation fragment set; and the method is used for executing search operation according to the word segmentation set to obtain a search result text set and the text type and text score of each search result text in the search result text set, and comprises the following steps: sequencing each word segmentation segment in the word segmentation segment set according to the sequence of the word frequency values from high to low, and assembling into a search queue according to the sequence; sequentially executing fuzzy search operation on the word segmentation segments in the search queue to obtain the search result text set; scoring each search result text in the search result text set according to the matching degree of each search result text in the search result text set and the search text to obtain a text score of each search result text; marking the text type of each search result text in the search result text set according to a preset text type classification label to obtain the text type of each search result text;
the text type judging and outputting module is used for judging whether the search result text set comprises a search result text of which the text type is a preset highest priority type; if the search result text set comprises a search result text of which the text type is the preset highest priority type, outputting the search result text of which the text type is the highest priority type;
the text score judging and outputting module is used for calculating to obtain an average difference value of the search result text set according to the text score of each search result text in the search result text set if the search result text set does not comprise the search result text of which the text type is the preset highest priority type; judging whether the average difference value of the search result text set meets a preset condition or not; if the average difference value meets the preset condition, outputting a search result text with the highest text score in the search result text set;
the word segmentation judging and outputting module is used for judging whether the search text can be segmented again or not if the average difference value does not meet the preset condition; if the search text can be segmented again, performing fine-grained segmentation operation on the search text to obtain an updated segmentation fragment set, and returning to the step of performing the search operation according to the segmentation fragment set; and if the search text can not be segmented into words, sequencing according to a default search result, and outputting the search result text set.
8. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a text search method as claimed in any one of claims 1 to 6.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a text search method according to any one of claims 1 to 6.
CN202110719746.8A 2021-06-28 2021-06-28 Text search method and device, electronic equipment and computer readable storage medium Active CN113342941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110719746.8A CN113342941B (en) 2021-06-28 2021-06-28 Text search method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110719746.8A CN113342941B (en) 2021-06-28 2021-06-28 Text search method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113342941A CN113342941A (en) 2021-09-03
CN113342941B true CN113342941B (en) 2022-08-26

Family

ID=77479029

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110719746.8A Active CN113342941B (en) 2021-06-28 2021-06-28 Text search method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113342941B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699574A (en) * 2013-11-28 2014-04-02 安徽科大讯飞信息科技股份有限公司 Retrieval optimization method and system for complex retrieval formula
CN107391535A (en) * 2017-04-20 2017-11-24 阿里巴巴集团控股有限公司 The method and device of document is searched in document application
CN110399459A (en) * 2019-07-16 2019-11-01 北京字节跳动网络技术有限公司 Searching method, device, terminal, server and the storage medium of online document

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100849272B1 (en) * 2001-11-23 2008-07-29 주식회사 엘지이아이 Method for automatically summarizing Markup-type documents
US11061913B2 (en) * 2018-11-30 2021-07-13 International Business Machines Corporation Automated document filtration and priority scoring for document searching and access

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103699574A (en) * 2013-11-28 2014-04-02 安徽科大讯飞信息科技股份有限公司 Retrieval optimization method and system for complex retrieval formula
CN107391535A (en) * 2017-04-20 2017-11-24 阿里巴巴集团控股有限公司 The method and device of document is searched in document application
CN110399459A (en) * 2019-07-16 2019-11-01 北京字节跳动网络技术有限公司 Searching method, device, terminal, server and the storage medium of online document

Also Published As

Publication number Publication date
CN113342941A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
WO2022160449A1 (en) Text classification method and apparatus, electronic device, and storage medium
CN112541338A (en) Similar text matching method and device, electronic equipment and computer storage medium
CN112380859A (en) Public opinion information recommendation method and device, electronic equipment and computer storage medium
CN113449187A (en) Product recommendation method, device and equipment based on double portraits and storage medium
CN111930962A (en) Document data value evaluation method and device, electronic equipment and storage medium
CN112883730B (en) Similar text matching method and device, electronic equipment and storage medium
CN112906377A (en) Question answering method and device based on entity limitation, electronic equipment and storage medium
CN114880368A (en) Data query method and device, electronic equipment and readable storage medium
CN114969484A (en) Service data searching method, device, equipment and storage medium
CN114706961A (en) Target text recognition method, device and storage medium
CN113868528A (en) Information recommendation method and device, electronic equipment and readable storage medium
CN112579781A (en) Text classification method and device, electronic equipment and medium
CN113342941B (en) Text search method and device, electronic equipment and computer readable storage medium
CN112925753B (en) File additional writing method and device, electronic equipment and storage medium
CN115438048A (en) Table searching method, device, equipment and storage medium
CN115525761A (en) Method, device, equipment and storage medium for article keyword screening category
CN115146064A (en) Intention recognition model optimization method, device, equipment and storage medium
CN114385815A (en) News screening method, device, equipment and storage medium based on business requirements
CN113268614A (en) Label system updating method and device, electronic equipment and readable storage medium
CN113343102A (en) Data recommendation method and device based on feature screening, electronic equipment and medium
CN112287676A (en) New word discovery method, device, electronic equipment and medium
CN115146627B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN114881037A (en) Named entity recognition method and device, electronic equipment and storage medium
CN114969385B (en) Knowledge graph optimization method and device based on document attribute assignment entity weight
CN115525731B (en) Webpage weight calculation method and device based on improved pagerank algorithm and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant