CN114386431A - Thermal updating method of sentence pattern resource library, sentence pattern recommendation method and related device - Google Patents

Thermal updating method of sentence pattern resource library, sentence pattern recommendation method and related device Download PDF

Info

Publication number
CN114386431A
CN114386431A CN202111626947.XA CN202111626947A CN114386431A CN 114386431 A CN114386431 A CN 114386431A CN 202111626947 A CN202111626947 A CN 202111626947A CN 114386431 A CN114386431 A CN 114386431A
Authority
CN
China
Prior art keywords
sentence
new
obtaining
similarity
resource library
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111626947.XA
Other languages
Chinese (zh)
Inventor
张星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202111626947.XA priority Critical patent/CN114386431A/en
Publication of CN114386431A publication Critical patent/CN114386431A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a thermal updating method of a sentence pattern resource library, a sentence pattern recommendation method and a related device, wherein the thermal updating method comprises the following steps: obtaining a plurality of new sentences accumulated on the front-end application system line; aiming at each new sentence, obtaining a first similarity between the new sentence and each existing sentence in the current sentence pattern resource library; responding to at least one first similarity degree related to the current new sentence in a threshold value range, obtaining a semantic tag to which the new sentence belongs, and putting the new sentence into a first database; and extracting at least one new sentence from the first database, vectorizing and expressing the new sentence, and storing the new sentence under the corresponding semantic tag in the sentence pattern resource library. By the mode, the sentence pattern resource library can be updated in time, so that the sentence pattern recommendation result is better.

Description

Thermal updating method of sentence pattern resource library, sentence pattern recommendation method and related device
Technical Field
The application belongs to the technical field of natural language processing, and particularly relates to a thermal updating method of a sentence pattern resource library, a sentence pattern recommendation method and a related device.
Background
With the development of technologies such as internet and big data, various industries generate mass data all the time and all the time, how to rapidly dig out similar sentences from the mass data is more and more important in business, and the method has great significance for reducing labor cost and rapidly completing effect optimization. Taking a financial scene as an example, a bank customer service generates thousands of call recording data every day, and sentences with similar meanings but different expression modes are buried in the call recording data. When a semantic tag and a corresponding existing sentence are given, a plurality of similar expanding sentences are searched from mass data, and the semantic tag plays an increasingly important role in quickly responding to business requirements and shortening a business optimization cycle. Therefore, it is extremely important to make the sentence recommendation accurate and real-time.
At present, sentences in the sentence pattern resource library are relatively fixed, and latest accumulated sentences cannot be updated into the sentence pattern resource library in time, so that recommended sentences are relatively old and the customer experience is poor.
Disclosure of Invention
The application provides a thermal updating method of a sentence pattern resource library, a sentence pattern recommendation method and a related device, which can update the sentence pattern resource library in time so as to ensure that a sentence pattern recommendation result is better.
In order to solve the technical problem, the application adopts a technical scheme that: a method for hot updating of sentence resource library is provided, which comprises: obtaining a plurality of new sentences accumulated on the front-end application system line; aiming at each new sentence, obtaining a first similarity between the new sentence and each existing sentence in the current sentence pattern resource library; responding to at least one first similarity degree related to the current new sentence in a threshold value range, obtaining a semantic tag to which the new sentence belongs, and putting the new sentence into a first database; and extracting at least one new sentence from the first database, vectorizing and expressing the new sentence, and storing the new sentence under the corresponding semantic tag in the sentence pattern resource library.
In order to solve the above technical problem, another technical solution adopted by the present application is: a sentence recommendation method is provided, including: obtaining an input sentence; obtaining a third similarity between the input sentence and each existing sentence in the sentence pattern resource library; wherein, the sentence resource library is updated by adopting the hot updating method in any embodiment; and outputting a plurality of existing sentences with higher third similarity.
In order to solve the above technical problem, another technical solution adopted by the present application is: a thermal update device for sentence resource library is provided, which comprises: the obtaining module is used for obtaining a plurality of new sentences accumulated on the front-end application system line; the filtering module is connected with the obtaining module and used for obtaining a first similarity between each new sentence and each existing sentence in the current sentence pattern resource library aiming at each new sentence; responding to at least one first similarity degree related to the current new sentence in a threshold value range, obtaining a semantic tag to which the new sentence belongs, and putting the new sentence into a first database; and the extraction module is connected with the filtering module and used for extracting at least one new sentence from the first database, vectorizing and expressing the new sentence and storing the new sentence under the corresponding semantic tag in the sentence pattern resource library.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided an electronic device comprising a memory and a processor coupled to each other, wherein the memory stores program instructions, and the processor is configured to execute the program instructions to implement a method for hot updating a schema resource library as described in any of the above embodiments, or a schema recommendation method as described in any of the above embodiments.
In order to solve the above technical problem, another technical solution adopted by the present application is: there is provided a storage device storing program instructions executable by a processor, the processor being configured to execute the program instructions to implement the method for hot updating of a schema resource pool as described in any of the above embodiments, or the method for schema recommendation as described in any of the above embodiments.
Being different from the prior art situation, the beneficial effect of this application is: in the method for updating the sentence pattern resource library in the hot mode, for a plurality of new sentences accumulated on a front-end application system line, the new sentences are firstly screened, and the new sentences meeting the threshold range are put into a first database; at least one new sentence can be subsequently fished from the first database and reflowed to the sentence pattern resource library so as to realize the thermal update of the sentence pattern resource library. The hot updating mode is simple, and sentences in the sentence pattern resource library can be updated in time, so that the sentence pattern recommendation effect is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without inventive efforts, wherein:
FIG. 1 is a flowchart illustrating an embodiment of a method for hot updating a sentry repository of the present application;
FIG. 2a is a flowchart illustrating an embodiment of obtaining the first similarity in step S102 of FIG. 1;
fig. 2b is a schematic diagram of a model structure corresponding to the first similarity obtained in step S102 in fig. 1;
FIG. 3 is a flowchart illustrating an embodiment of a sentence recommendation method of the present application;
FIG. 4 is a schematic diagram illustrating an embodiment of a thermal update apparatus for a sentry repository of the present application;
FIG. 5 is a schematic structural diagram of an embodiment of an electronic device according to the present application;
fig. 6 is a schematic structural diagram of an embodiment of a memory device according to the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a schematic flow chart illustrating an embodiment of a thermal update method for a sentence-type resource library of the present application, the thermal update method comprising:
s101: a plurality of new sentences accumulated on-line by the front-end application system are obtained.
Specifically, the sentences may also be referred to as similarity questions, the front-end application system may be a financial customer service system, and the front-end application system may continuously accumulate new sentences manually input or new sentences automatically generated by a machine according to speech recognition. For example, the new sentence may be "how can you transfer a call later, perform a telephone bank login by self-help voice, and perform an opening of a short message reminding service after a password is reset? 'or' hello, i want to ask my bank card not, and now do not ask that it is a short message reminder, and do you want to do a short message reminder and go to the service hall? ' or ' good ', after a call is transferred to self-service voice, according to the prompt of the system, a card number and an identification number of the call are input, a six-digit telephone bank login password is set after the withdrawal password is input, and a short message can be opened to remind that the user is right according to the prompt? Can one be prompted by the system? ".
When the number of new sentences accumulated by the front-end application system exceeds a threshold value, the following steps S101 to S103 can be triggered. Alternatively, the threshold may be 50, 100, or the like, and may be set manually.
S102: aiming at each new sentence, obtaining a first similarity between the new sentence and each existing sentence in the current sentence pattern resource library; and in response to at least one first similarity degree related to the current new sentence being within a threshold range, obtaining a semantic tag to which the new sentence belongs, and putting the new sentence into the first database.
Specifically, referring to fig. 2a and fig. 2b, fig. 2a is a flowchart illustrating an embodiment of obtaining the first similarity in step S102 in fig. 1, and fig. 2b is a schematic diagram illustrating a model structure corresponding to the first similarity obtained in step S102 in fig. 1. The flowchart of one embodiment of the step S102 of obtaining the first similarity between the new sentence and each existing sentence in the current sentence resource library may be as follows:
s201: a first vector representation of the new sentence is obtained using the pre-trained model 10.
In particular toIn this embodiment, the pre-training model 10 may be a BERT model or the like; after the new sentence is input into the pre-training model 10, the embedding layer (embedding) of the pre-training model 10 maps the new sentence text into the dimension R(m×n)The first vector representation of (a) is provided with location information and context information and contains deep level associations with respective location vectors. Is formulated as follows:
E=BertEmb(T),E∈Rm×n
where E is the first vector representation and T is the new sentence.
S202: the first vector representation is encoded with encoder 12 to obtain a first jointly encoded vector.
Specifically, when the new sentence is a long text, the first vector representation encoded by RNN often causes a problem of gradient disappearance, resulting in the previous content in the new sentence being ignored by the model. In this embodiment, the first vector representation is encoded by using a coding layer (transforms-Encoder) of the Encoder 12, so that the problem of gradient disappearance can be solved, and a first joint coding vector containing context information can be output through the coding layer of the Encoder 12. Is formulated as follows:
TC=TransformerEncoder(E),H∈Rm×h
wherein, TCIs the first joint encoded vector and E is the first vector representation.
S203: obtaining cosine similarity between the first joint coding vector and a second joint coding vector of each existing sentence in the current sentence pattern resource library; wherein, the cosine similarity is used as the first similarity.
Specifically, the second joint code vector obtaining process of each existing sentence in the current sentence resource library is similar to the above steps S201 to S202, and will not be described in detail herein; for example, a second vector representation of an existing sentence may be obtained using the pre-trained model 10 in FIG. 2 b; the second vector representation is encoded with encoder 12 to obtain a second combined encoded vector.
Further, the cosine similarity is mainly to calculate a cosine distance between the first joint encoding vector and the second joint encoding vector, and to take an included angle between the first joint encoding vector and the second joint encoding vector as a consideration angle, and to take a product of an inner product (multiplication and summation of corresponding elements) of the first joint encoding vector and the second joint encoding vector and a product of moduli of the first joint encoding vector and the second joint encoding vector as a calculation result. Is formulated as follows:
Figure BDA0003439889510000051
wherein similarity is a first similarity, A represents a first joint encoding vector, B represents a second joint encoding vector, AiRepresenting the ith element, B, of the first jointly encoded vectoriRepresenting the ith element in the second concatenated coding vector.
In another embodiment, before step S102, a process of setting a threshold range may be further included, which specifically includes:
A. and obtaining a second similarity between a plurality of existing sentences under each semantic label (also called knowledge point) in the current sentence pattern resource library. Specifically, the second similarity may be a cosine similarity. When the current sentence pattern resource library is established, different existing sentences can be aggregated under different semantic tags through similarity, and two existing sentences with similarity exceeding a certain value can be attributed under the same semantic tag. Assuming that the current sentence pattern resource library includes 10 semantic tags and 100 existing sentences are respectively corresponding to the 10 semantic tags, a second similarity between any two existing sentences under each semantic tag can be obtained at this time.
B. And obtaining the maximum similarity value and the minimum similarity value in all the second similarity values under all the semantic labels. Specifically, after obtaining a plurality of second similarity degrees through the above step a, the second similarity degrees may be sorted from high to low to obtain a maximum similarity value SmaxAnd minimum similarity value Smin
C. Setting a maximum threshold of a threshold range according to the maximum similarity value, and setting a maximum threshold according to the minimum similarityThe value sets the minimum threshold of the threshold range. For example, the first coefficient λ may be1With the maximum similarity value SmaxAs the maximum threshold value PmaxAnd applying the second coefficient lambda2With the minimum similarity value SminIs taken as the minimum threshold value Pmin(ii) a Wherein the first coefficient lambda1And a second coefficient lambda2Greater than 0 and less than or equal to 1; in this case, the threshold range is [ lambda ]1*Pmin2*Pmax]. Alternatively, the first coefficient and the second coefficient may be the same.
The threshold range setting method is simple, and the semantic range covered by the similarity questions (i.e. sentences) can be expanded by the method. And if the maximum threshold value P in the set threshold value range ismaxIf the value is higher, the new sentences screened out are very similar to the existing sentences, and the significance is not great; if the minimum threshold value P in the set threshold value rangeminAnd if the value is lower, the selected new sentence is far away from the existing sentence, and the significance is not great. So in summary, the first coefficient and the second coefficient may be set to about 0.8. Of course, in other application scenarios, the threshold range may be manually predefined, for example, the threshold range may be directly defined as [0.5, 0.8 ]]。
Further, in response to that at least one first similarity related to the current new sentence is within the threshold range in step S102, a semantic tag to which the new sentence belongs is obtained, and a specific implementation process of placing the new sentence in the first database may be: A. and aiming at each semantic tag in the current sentence pattern resource library, obtaining a first number of the new sentence and the existing sentence under the current semantic tag, wherein the first similarity of the new sentence and the existing sentence under the current semantic tag is within a threshold range, and obtaining the ratio of the first number to a second number of the existing sentence under the current semantic tag. B. And taking the semantic label corresponding to the maximum ratio as the semantic label of the new sentence, and putting the new sentence into the first database. Alternatively, the first database may be an es (elastic search) database. The process for determining the semantic tag to which the new sentence belongs is simple and easy to implement.
In an application scenario, assuming that there are 100 existing sentences in the current sentence pattern resource library, which are divided into two semantic tags, 60 existing sentences under the L1 semantic tag, and 40 existing sentences under the L2 semantic tag, the new sentence and the original 100 existing sentences are all calculated to obtain a first similarity, and 100 first similarities are obtained in total. For the 100 first similarities, it is first determined whether any of the 100 first similarities falls within the threshold range. If all 100 first similarity degrees are outside the threshold range, the new sentence is not related to the semantic tag in the current sentence pattern resource library, and the new sentence can be discarded. If at least one of the 100 first similarities is within the threshold range, which semantic tag the new sentence belongs to is determined, and specifically, which semantic tag occupies a higher proportion can be determined according to the proportion. For example, the first similarity of the new sentence to 30 existing sentences under the semantic label of L1 is within a threshold range, and the first similarity to 25 existing sentences under the semantic label of L2 is within a threshold range; since 30/60<25/40, the new sentence can be attributed under the L2 semantic tag and placed in the first database.
Generally, the first database is provided with a first maximum storage amount M1, which may be configured according to actual requirements M1. In response to the number of new sentences stored in the first database exceeding the first maximum stored number M1, at least one new sentence stored (i.e., entered) earlier in time in the first database may be deleted. The design mode can control the total amount of new sentences which subsequently flow back to the sentence pattern resource library so as to ensure the operation efficiency. In this embodiment, the new sentence satisfying the threshold range may be stored in the first database first, and then the step of deleting the sentence with the earlier entry time in the first database may be performed. In other embodiments, it may also be determined whether the current first database can accommodate the current new sentence, and if yes, the new sentence is directly stored in the first database; if not, deleting the sentence with the earlier entering time in the first database, and then storing the new sentence into the first database.
S103: at least one new sentence is extracted from the first database, and the new sentence is vectorized and stored under the corresponding semantic tag in the sentence pattern resource library.
Specifically, the new sentences stored in the first database may not only include the current batch obtained by filtering in step S102, but also include the historical batch obtained by filtering in step S102 but not extracted and not deleted. Before the step S103 extracts a new sentence from the first database, an upper limit value of the new sentence that can be extracted by each semantic tag in the sentence pattern resource library may be set; the specific process can be as follows: and obtaining the total number N of the semantic tags in the current sentence pattern resource library, and taking the ratio M1/N of the first maximum storage quantity M1 of the first database to the total number N as an upper limit value of the new sentence which can be reflowed under each semantic tag. The upper limit value is set so that the number of new sentences reflowed under each semantic label does not differ too much, and the sentence recommendation effect in the subsequent application process is good.
Further, when at least one new sentence is extracted from the first database in step S103, the number of the extracted new sentences having the same semantic tag is less than or equal to the upper limit value M1/N. At this time, after the new sentence is extracted, the data related to the new sentence is deleted from the first database.
For each semantic tag, when the total number of new sentences belonging to the semantic tag contained in the first database is less than or equal to the upper limit value M1/N, all new sentences belonging to the semantic tag in the first database may be reflowed to the current sentence pattern resource library together.
Or, for each semantic tag, when the total number of new sentences belonging to the semantic tag contained in the first database is greater than the upper limit value M1/N, the M1/N new sentences belonging to the semantic tag in the first database may be reflowed to the current sentence resource library together. The principle of filtering M1/N new sentences returned to the current sentence resource library may be: screening according to the time of entering the first database; for example, M1/N new sentences that are entered into the first database later in time may be reflowed.
In addition, the implementation process of storing the new sentence after vectorization representation in the step S103 to the corresponding semantic tag in the sentence pattern resource library may be: and a resource compiling engine of the calling engine carries out vectorization representation on all the new reflux sentences, the vectorized reflux resources (namely the new sentences) need to be written into the resource updating directory file and are combined into the current sentence pattern resource library, and the engine regularly detects and loads the new resources. During the next round of hot update or sentence recommendation using the sentence resource library, the new sentence currently merged into the sentence resource library can be regarded as the existing sentence in the sentence resource library.
In addition, in this embodiment, the sentence resource library may also be provided with a second maximum storage quantity M2, and the second maximum storage quantity M2 may be configured according to actual requirements. At least one existing sentence stored (i.e., entered) earlier in time in the sentence resource pool may be deleted in response to the number of existing sentences stored in the sentence resource pool exceeding the second maximum stored number M2. The design mode can control the total amount of the sentences in the sentence pattern resource library to ensure the operation efficiency. In this embodiment, the step of deleting the existing sentence with an earlier entry time in the sentence pattern resource library may be performed after the extracted new sentence is returned to the sentence pattern resource library. In other embodiments, before the new sentence is returned to the sentence pattern resource library, it may be determined whether the current sentence pattern resource library can accommodate the new sentence with the current quantity, and if so, the new sentence with the current quantity is directly returned to the sentence pattern resource library; if not, deleting the existing sentences with earlier entering time in the sentence pattern resource library, and then returning the new sentences with the current quantity to the sentence pattern resource library.
In the method for updating a sentence resource library provided in the above steps S101 to S103, for a plurality of new sentences accumulated on the front-end application system line, the plurality of new sentences are first screened, and the new sentences meeting the threshold range are put into the first database; at least one new sentence can be subsequently fished from the first database and reflowed to the sentence pattern resource library so as to realize the thermal update of the sentence pattern resource library. The hot updating mode is simple, and sentences in the sentence pattern resource library can be updated in time, so that the sentence pattern recommendation effect is improved.
Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of a sentence recommendation method according to the present application, the sentence recommendation method includes:
s301: an input sentence is obtained.
Specifically, the user can input a sentence (which may also be referred to as a similar question) by voice, a keyboard, or the like.
S302: a third similarity between the input sentence and each existing sentence in the sentence pattern resource library is obtained.
Specifically, the sentence resource library is updated by the hot update method mentioned in any of the above embodiments. The processor can call the deep learning model for vectorization expression of the received input sentences, measure the input sentences with the existing sentences in the sentence pattern resource library built in the engine in advance respectively, and calculate to obtain a third similarity.
Optionally, the specific implementation process of the step S302 may be: obtaining a second vector representation of the input sentence using the pre-trained model 10 in FIG. 2 b; encoding the second vector representation with encoder 12 to obtain a third concatenated encoded vector; and obtaining cosine similarity between the third joint coding vector and a fourth joint coding vector of each existing sentence in the sentence pattern resource library, wherein the cosine similarity is the third similarity.
S303: and outputting a plurality of existing sentences with higher third similarity.
Specifically, all the third similarities obtained in step S302 may be sorted to return the first K sentences with higher third similarities, and the size of K may be defined artificially.
Referring to fig. 4, fig. 4 is a schematic structural diagram of an embodiment of a thermal update apparatus for a sentence-type resource library of the present application, including an obtaining module 20, a filtering module 22 and an extracting module 24.
The obtaining module 20 is configured to obtain a plurality of new sentences accumulated on the front-end application system line. The filtering module 22 is connected to the obtaining module 20, and is configured to obtain, for each new sentence, a first similarity between the new sentence and each existing sentence in the current sentence pattern resource library; and in response to at least one first similarity degree related to the current new sentence being within a threshold range, obtaining a semantic tag to which the new sentence belongs, and putting the new sentence into the first database. The extracting module 24 is connected to the obtaining module 22, and is configured to extract at least one new sentence from the first database, perform vectorization expression on the new sentence, and store the new sentence under a corresponding semantic tag in the sentence pattern resource library.
In one embodiment, as shown in fig. 4, the hot-update apparatus provided by the present application further includes a first setting module 26, connected to the filtering module 22, for obtaining a second similarity between a plurality of existing sentences under each semantic tag in the current sentence resource library before the filtering module 22 executes; obtaining the maximum similarity value and the minimum similarity value in all the second similarity values under all the semantic labels; the maximum threshold value of the threshold value range is set according to the maximum similarity value, and the minimum threshold value of the threshold value range is set according to the minimum similarity value.
Optionally, the step of setting the maximum threshold of the threshold range according to the maximum similarity value and setting the minimum threshold of the threshold range according to the minimum similarity value includes: taking a first product of the first coefficient and the maximum similarity value as a maximum threshold value and a second product of the second coefficient and the minimum similarity value as a minimum threshold value; wherein the first coefficient and the second coefficient are greater than 0 and less than or equal to 1.
In another embodiment, in response to that at least one first similarity related to the current new sentence is within a threshold range, the step of obtaining the semantic tag to which the new sentence belongs in the filtering module 22 specifically includes: aiming at each semantic tag in the current sentence pattern resource library, obtaining a first number of the new sentence and the existing sentence under the semantic tag, wherein the first similarity of the new sentence and the existing sentence under the semantic tag is within a threshold range, and obtaining a ratio of the first number to a second number of the existing sentence under the semantic tag; and taking the semantic label corresponding to the maximum ratio as the semantic label of the new sentence.
In yet another embodiment, the first database is provided with a maximum storage amount; the hot-update apparatus provided by the present application may further include a second setting module 28, connected to the extraction module 24, and configured to obtain a total number of semantic tags in the current sentence resource library before the extraction module 24 executes, and use a ratio of the maximum storage number to the total number as an upper limit value of a new sentence reflowable under each semantic tag. At this time, when the extraction module 24 extracts at least one new sentence from the first database, the number of the extracted new sentences having the same semantic tag is less than or equal to the upper limit value.
In another embodiment, the hot-update apparatus provided by the present application may further include a deletion module, connected to the filtering module 22, for deleting at least one new sentence stored in the first database earlier in time in response to the number of new sentences stored in the first database exceeding the maximum storage number.
In another embodiment, the step of obtaining the first similarity between the new sentence and each existing sentence in the current sentence resource library in the filtering module 22 specifically includes: obtaining a first vector representation of the new sentence by using a pre-training model; encoding the first vector representation with an encoder to obtain a first jointly encoded vector; obtaining cosine similarity between the first joint coding vector and a second joint coding vector of each existing sentence; wherein, the cosine similarity is used as the first similarity.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of an electronic device according to the present application, the electronic device including: a memory 32 and a processor 30 coupled to each other, wherein the memory 32 stores program instructions, and the processor 30 is configured to execute the program instructions to implement any of the above-mentioned methods for thermal updating of the pattern resource library, or the pattern recommendation method. Specifically, electronic devices include, but are not limited to: desktop computers, notebook computers, tablet computers, servers, etc., without limitation thereto. Further, the processor 30 may also be referred to as a CPU (central Processing Unit). The processor 30 may be an integrated circuit chip having signal processing capabilities. The Processor 30 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 30 may be commonly implemented by an integrated circuit chip.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of a storage device 40 of the present application, in which a program instruction 400 capable of being executed by a processor is stored, and the program instruction 400 is used to implement any of the above-mentioned methods for thermally updating a schema resource library or recommending a schema.
In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above embodiments are merely examples, and not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure, or their direct or indirect application to other related arts, are included in the scope of the present disclosure.

Claims (11)

1. A method for hot updating a sentence resource library, comprising:
obtaining a plurality of new sentences accumulated on the front-end application system line;
aiming at each new sentence, obtaining a first similarity between the new sentence and each existing sentence in the current sentence pattern resource library; responding to at least one first similarity degree related to the current new sentence in a threshold value range, obtaining a semantic tag to which the new sentence belongs, and putting the new sentence into a first database;
and extracting at least one new sentence from the first database, vectorizing and expressing the new sentence, and storing the new sentence under the corresponding semantic tag in the sentence pattern resource library.
2. The method of claim 1, wherein the step of obtaining semantic tags to which the new sentence belongs and placing the new sentence in the first database in response to at least one of the first similarities related to the new sentence being within a threshold range comprises:
obtaining a second similarity between a plurality of existing sentences under each semantic label in the sentence pattern resource library;
obtaining the maximum similarity value and the minimum similarity value in all the second similarity values under all the semantic labels;
setting a maximum threshold of the threshold range according to the maximum similarity value, and setting a minimum threshold of the threshold range according to the minimum similarity value.
3. The thermal update method of claim 2, wherein the steps of setting a maximum threshold of the threshold range according to the maximum similarity value and setting a minimum threshold of the threshold range according to the minimum similarity value comprise:
taking a first product of a first coefficient and the maximum similarity value as the maximum threshold value and a second product of a second coefficient and the minimum similarity value as the minimum threshold value; wherein the first coefficient and the second coefficient are greater than 0 and less than or equal to 1.
4. The method according to claim 1, wherein the step of obtaining the semantic tag to which the new sentence belongs in response to at least one of the first similarities related to the new sentence being within a threshold range comprises:
for each semantic tag in the current sentence pattern resource library, obtaining a first number of the new sentence and the existing sentence under the semantic tag, wherein the first similarity between the new sentence and the existing sentence under the semantic tag is within the threshold range, and obtaining a ratio of the first number to a second number of the existing sentence under the semantic tag;
and taking the semantic label corresponding to the maximum ratio as the semantic label of the new sentence.
5. Thermal update method according to claim 1,
the first database is provided with a maximum storage quantity;
before the step of extracting at least one new sentence from the first database, the method comprises: obtaining the total number of the semantic tags in the current sentence pattern resource library, and taking the ratio of the maximum storage quantity to the total number as an upper limit value of a new sentence which can be reflowed under each semantic tag;
when at least one new sentence is extracted from the first database, the number of the extracted new sentences with the same semantic tags is less than or equal to the upper limit value.
6. The thermal update method of claim 5, further comprising:
deleting at least one of the new sentences stored earlier in the first database in response to the number of the new sentences stored in the first database exceeding the maximum stored number.
7. The method of claim 1, wherein the step of obtaining a first similarity between the new sentence and each existing sentence in the current sentence resource pool comprises:
obtaining a first vector representation of the new sentence using a pre-trained model;
encoding the first vector representation with an encoder to obtain a first jointly encoded vector;
obtaining cosine similarity between the first joint coding vector and a second joint coding vector of each existing sentence; wherein the cosine similarity is used as the first similarity.
8. A sentence recommendation method, comprising:
obtaining an input sentence;
obtaining a third similarity between the input sentence and each existing sentence in the sentence pattern resource library; wherein the sentence resource library is updated by the hot update method of any one of claims 1-7;
and outputting a plurality of existing sentences with higher third similarity.
9. A device for hot updating sentence resource pool, comprising:
the obtaining module is used for obtaining a plurality of new sentences accumulated on the front-end application system line;
the filtering module is connected with the obtaining module and used for obtaining a first similarity between each new sentence and each existing sentence in the current sentence pattern resource library aiming at each new sentence; responding to at least one first similarity degree related to the current new sentence in a threshold value range, obtaining a semantic tag to which the new sentence belongs, and putting the new sentence into a first database;
and the extraction module is connected with the filtering module and used for extracting at least one new sentence from the first database, vectorizing and expressing the new sentence and storing the new sentence under the corresponding semantic tag in the sentence pattern resource library.
10. An electronic device comprising a memory and a processor coupled to each other, the memory having stored therein program instructions, the processor being configured to execute the program instructions to implement the method for thermal updating of a pattern resource pool of any of claims 1 to 7 or the method for schema recommendation of claim 8.
11. A storage device having stored thereon program instructions executable by a processor, the processor being configured to execute the program instructions to implement the method for thermal updating of a pattern resource pool as claimed in any one of claims 1 to 7 or the method for schema recommendation as claimed in claim 8.
CN202111626947.XA 2021-12-28 2021-12-28 Thermal updating method of sentence pattern resource library, sentence pattern recommendation method and related device Pending CN114386431A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111626947.XA CN114386431A (en) 2021-12-28 2021-12-28 Thermal updating method of sentence pattern resource library, sentence pattern recommendation method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111626947.XA CN114386431A (en) 2021-12-28 2021-12-28 Thermal updating method of sentence pattern resource library, sentence pattern recommendation method and related device

Publications (1)

Publication Number Publication Date
CN114386431A true CN114386431A (en) 2022-04-22

Family

ID=81198735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111626947.XA Pending CN114386431A (en) 2021-12-28 2021-12-28 Thermal updating method of sentence pattern resource library, sentence pattern recommendation method and related device

Country Status (1)

Country Link
CN (1) CN114386431A (en)

Similar Documents

Publication Publication Date Title
US11227118B2 (en) Methods, devices, and systems for constructing intelligent knowledge base
US11216701B1 (en) Unsupervised representation learning for structured records
CN111950287B (en) Entity identification method based on text and related device
CN112215008A (en) Entity recognition method and device based on semantic understanding, computer equipment and medium
CN110427453B (en) Data similarity calculation method, device, computer equipment and storage medium
CN112926308B (en) Method, device, equipment, storage medium and program product for matching text
CN112231569A (en) News recommendation method and device, computer equipment and storage medium
CN112579733A (en) Rule matching method, rule matching device, storage medium and electronic equipment
CN112183030A (en) Event extraction method and device based on preset neural network, computer equipment and storage medium
CN112906368A (en) Industry text increment method, related device and computer program product
CN112307738A (en) Method and device for processing text
CN116756281A (en) Knowledge question-answering method, device, equipment and medium
CN116186219A (en) Man-machine dialogue interaction method, system and storage medium
CN116127013A (en) Personal sensitive information knowledge graph query method and device
CN115221954A (en) User portrait method, device, electronic equipment and storage medium
CN114386431A (en) Thermal updating method of sentence pattern resource library, sentence pattern recommendation method and related device
CN114638221A (en) Business model generation method and device based on business requirements
CN114328894A (en) Document processing method, document processing device, electronic equipment and medium
CN113434631A (en) Emotion analysis method and device based on event, computer equipment and storage medium
CN112199954A (en) Disease entity matching method and device based on voice semantics and computer equipment
CN112948561A (en) Method and device for automatically expanding question-answer knowledge base
CN117290510B (en) Document information extraction method, model, electronic device and readable medium
CN114238574B (en) Intention recognition method based on artificial intelligence and related equipment thereof
CN111476037B (en) Text processing method and device, computer equipment and storage medium
CN114757198A (en) Similar method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination