CN111553151A - Question recommendation method and device based on field similarity calculation and server - Google Patents

Question recommendation method and device based on field similarity calculation and server Download PDF

Info

Publication number
CN111553151A
CN111553151A CN202010255040.6A CN202010255040A CN111553151A CN 111553151 A CN111553151 A CN 111553151A CN 202010255040 A CN202010255040 A CN 202010255040A CN 111553151 A CN111553151 A CN 111553151A
Authority
CN
China
Prior art keywords
field
fields
similarity
question
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010255040.6A
Other languages
Chinese (zh)
Inventor
赵亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202010255040.6A priority Critical patent/CN111553151A/en
Publication of CN111553151A publication Critical patent/CN111553151A/en
Priority to PCT/CN2021/078031 priority patent/WO2021196934A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application is suitable for the technical field of computers, and provides a problem recommendation method and device based on field similarity calculation, a storage medium and a server. The question recommendation method comprises the following steps: acquiring an input first question sentence; performing word segmentation processing on the first question sentence, and extracting each field contained in the first question sentence; comparing the fields with fields in a field data table which is constructed in advance one by one, finding out the fields which are the same as the fields in the field data table, and determining the fields as target fields; respectively calculating the similarity between the target field and each other field except the target field in the field data table; and selecting the field with the highest similarity in the other fields, and replacing the target field in the first question sentence to obtain the recommended second question sentence. By adopting the question recommending method, new question sentences which are more in line with the expectation of the user can be generated, and the question recommending accuracy of the intelligent question answering system is improved.

Description

Question recommendation method and device based on field similarity calculation and server
Technical Field
The application belongs to the technical field of computers, and particularly relates to a problem recommendation method and device based on field similarity calculation, a storage medium and a server.
Background
The working principle of an intelligent question-answering system based on natural language is that a user inputs a question, the intelligent question-answering system carries out natural language processing on the question to generate a structured query language, then the content of a response is searched in a database or a knowledge base according to the structured query language, and finally a query result is returned to the user.
At present, two main problem recommendation modes of an intelligent question-answering system are available, wherein one mode is real-time recommendation, namely, the real-time recommendation is carried out according to a question currently input by a user; the other is a similar problem recommendation. When real-time recommendation is performed, keyword triggering is often performed, for example, when a user inputs "by", an enumerated field name is recommended; and on the similar question recommendation, the same type of keywords in the original question are randomly replaced, so that a new question is spelled. However, the problem of the above two recommendation methods is often far from the expectation of the user, and the accuracy of the problem recommendation is low.
Disclosure of Invention
In view of this, the present application provides a question recommendation method, device, storage medium, and server based on field similarity calculation, which can improve the precision of the intelligent question-answering system in recommending questions.
In a first aspect, an embodiment of the present application provides a problem recommendation method based on field similarity calculation, including:
acquiring an input first question sentence;
performing word segmentation processing on the first question sentence, and extracting each field contained in the first question sentence;
comparing the fields with fields in a field data table which is constructed in advance one by one, finding out the fields which are the same as the fields in the field data table, and determining the fields as target fields;
respectively calculating the similarity between the target field and each other field except the target field in the field data table;
and selecting the field with the highest similarity in the other fields, and replacing the target field in the first question sentence to obtain the recommended second question sentence.
Further, the similarity between the target field and any other field in the field data table can be calculated by the following steps:
calculating a similarity index of the target field and any one of the other fields by combining the character string and the enumerated value of the target field and the character string and the enumerated value of any one of the other fields, wherein the similarity index is a parameter for measuring the similarity between the two fields;
and calculating the similarity between the target field and any one of the other fields according to the similarity indexes of the target field and any one of the other fields.
Further, the calculating the similarity indicator between the target field and any other field may include:
calculating a character string similarity index, a character string length similarity index, an enumerated value number similarity index and an enumerated value length similarity index of the target field and any other field;
the calculating the similarity between the target field and the any one of the other fields according to the similarity indicator between the target field and the any one of the other fields may include:
and calculating an average value or a weighted average value of the character string similarity index, the character string length similarity index, the enumerated value number similarity index and the enumerated value length similarity index as the similarity of the target field and any other field.
Further, the character string similarity index may be calculated by the following formula:
Figure BDA0002436974510000021
wherein s is1Representing the character string similarity index, sim representing the number of identical character strings of the two fields, short representing the length of the character string of the field with shorter length of the two fields, and long representing the length of the character string of the field with shorter length of the two fieldsThe long field has a string length, α is a super parameter, used to control the impact of the string on the similarity;
the string length similarity index may be calculated using the following formula:
Figure BDA0002436974510000031
wherein s is2Representing the character string length similarity index, wherein short represents the character string length of the field with the shorter length in the two fields, and long represents the character string length of the field with the longer length in the two fields;
the enumerated value number similarity index may be calculated by the following formula:
Figure BDA0002436974510000032
wherein s is3Representing the similarity index of the enumeration value numbers, wherein min represents the enumeration value number of a field with a smaller enumeration value number in the two fields, and max represents the enumeration value number of a field with a larger enumeration value number in the two fields;
the enumerated value length similarity index may be calculated using the following formula:
Figure BDA0002436974510000033
wherein s is4And indicating the enumeration value length similarity index, wherein avg _ min indicates the average length of the enumeration values of the fields with shorter average lengths of the enumeration values in the two fields, and avg _ max indicates the average length of the enumeration values of the fields with longer average lengths of the enumeration values in the two fields.
Further, the separately calculating the similarity between the target field and each of the other fields in the field data table except the target field may include:
searching all historical question sentences of the user who inputs the first question sentence;
constructing a co-occurrence matrix according to the historical question sentences, wherein the co-occurrence matrix records the times of the common occurrence of any two fields in the field data table in the same historical question sentences of the user;
and calculating the similarity between the target field and each other field according to the co-occurrence matrix.
Further, the determining the similarity between the target field and each of the other fields according to the co-occurrence matrix may include:
extracting field vectors of the target fields and field vectors of each other field from the co-occurrence matrix respectively, wherein each element of the field vectors is the times of the common occurrence of the corresponding field and each field in the field data table in the same historical question sentence of the user;
and respectively calculating cosine similarity between the field vector of the target field and the field vectors of each other field to obtain the similarity between the target field and each other field.
Further, after constructing the co-occurrence matrix according to the historical question statement, the method may further include:
determining a field with the most times which appears in the same historical question sentence of the user together with the target field in the field data table according to the co-occurrence matrix;
and selecting the field with the most times, and replacing the target field in the first question sentence to obtain a recommended third question sentence.
In a second aspect, an embodiment of the present application provides a question recommendation device based on field similarity calculation, including:
the question acquisition module is used for acquiring an input first question sentence;
the word segmentation module is used for carrying out word segmentation on the first question sentence and extracting each field contained in the first question sentence;
the field comparison module is used for comparing each field with fields in a field data table which is constructed in advance one by one, finding out the same fields of each field and the field data table and determining the same fields as target fields;
the field similarity calculation module is used for calculating the similarity between the target field and each other field except the target field in the field data table;
and the question recommending module is used for selecting the field with the highest similarity in the other fields, and replacing the target field in the first question sentence to obtain a recommended second question sentence.
In a third aspect, an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the problem recommendation method as set forth in the first aspect of the embodiment of the present application.
In a fourth aspect, an embodiment of the present application provides a server, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the problem recommendation method as set forth in the first aspect of the embodiment of the present application.
In a fifth aspect, an embodiment of the present application provides a computer program product, which, when running on a terminal device, causes the terminal device to execute the steps of the problem recommendation method according to the first aspect.
According to the problem recommendation method based on field similarity calculation, after each field of an input question sentence is extracted, each field is compared with fields in a field data table constructed in advance one by one, the same field in the extracted field and the field data table is found out, and the field is determined to be a target field; then, the similarity between the target field and each other field in the field data table is respectively calculated, the field with the highest similarity is found out, and the target field in the question sentence is replaced, so that the recommended question sentence is obtained. Compared with a conventional mode of randomly replacing keywords of the same type in sentences, the method and the device comprehensively consider the similarity among all preset fields, replace the fields in the original question sentences by the fields with the highest similarity, can generate new question sentences which are more in line with the expectation of users, and improve the precision of the questions recommended by the intelligent question-answering system.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flowchart of a first embodiment of a problem recommendation method provided by an embodiment of the present application;
FIG. 2 is a flow chart of a second embodiment of a problem recommendation method provided by an embodiment of the present application;
FIG. 3 is a flowchart of a third embodiment of a question recommendation method provided by an embodiment of the present application;
FIG. 4 is a block diagram of an embodiment of a problem recommendation device provided by an embodiment of the present application;
fig. 5 is a schematic diagram of a server according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail. Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.
The application provides a question recommendation method, a question recommendation device, a storage medium and a server, which can improve the precision of question recommendation of an intelligent question-answering system.
It should be understood that the subject of the problem recommendation method based on field similarity calculation proposed in the embodiments of the present application is various types of servers or terminal devices.
Referring to fig. 1, a first embodiment of a method for recommending a question based on field similarity calculation in an embodiment of the present application includes:
101. acquiring an input first question sentence;
the user can input the question to be asked, namely the first question sentence, on the terminal device through voice input or manually, and the question sentence is sent to the intelligent question-answering system at the server side.
102. Performing word segmentation processing on the first question sentence, and extracting each field contained in the first question sentence;
after the server obtains the question sentence, the server divides the question sentence into words and extracts each field contained in the question sentence. During word segmentation, various different types of word segmentation methods in the prior art can be adopted, for example, jieba word segmentation can be adopted, if the user proposes the following problems: "how different occupational average ages are for men? ", then after using jieba segmentation, the fields list [" male "," different "," professional "," average "," age "," how ","? "].
103. Comparing the fields with fields in a field data table which is constructed in advance one by one, finding out the fields which are the same as the fields in the field data table, and determining the fields as target fields;
after obtaining each field in the first question sentence by word segmentation, the server compares each field with fields in a field data table constructed in advance one by one, finds out the same field of each field and the field data table, and determines the same field as a target field.
The pre-constructed field data table may be as shown in table 1 below:
TABLE 1
Name (I) Occupation of the world Sex Age (age) Personal income after tax and month Industry
Zhang three Policeman For male 35 4500 Security protection
Li four Waiter Woman 29 4000 Service
In table 1, "name", "occupation", "sex", "age", "personal income after tax, and" industry "are all fields of the field data table, and" zhang san "," lie si "," waiter "," police "," male "," female "," security "," service ", and the like are enumerated values of the fields. When constructing a field data table, the above fields and enumerated values are written into a data structure, for example, in python language, the above data may be stored in a ditt type, forming a ditt type data structure table.
In addition, the fields can be added into a custom dictionary of the jieba, so that the field keywords are not cut open when the words of the question input by the user are segmented. For example, for the field keyword "personal monthly income", jieba will by default cut it into "personal", "monthly income", 3 fields, whereas jieba will not cut it if "personal monthly income" is added to the jieba's custom dictionary.
Let the various fields be list [ "male", "different", "professional", "average", "age", "how", "? "], these fields are compared with the respective fields in Table 1 to find the same fields as" occupation "and" age "as target fields. Here, one or more target fields may be used.
104. Respectively calculating the similarity between the target field and each other field except the target field in the field data table;
after determining the target field, respectively calculating the similarity between the target field and each other field except the target field in the field data table. For example, in the example of table 1, for the target field "occupation", the similarity between "occupation" and "name", the similarity between "occupation" and "gender", the similarity between "occupation" and "age", the similarity between "occupation" and "personal tax monthly income", and the similarity between "occupation" and "industry" are calculated.
Further, the similarity between the target field and any other field in the field data table can be calculated by the following steps:
(1) calculating a similarity index of the target field and any one of the other fields by combining the character string and the enumerated value of the target field and the character string and the enumerated value of any one of the other fields, wherein the similarity index is a parameter for measuring the similarity between the two fields;
(2) and calculating the similarity between the target field and any one of the other fields according to the similarity indexes of the target field and any one of the other fields.
Related attribute parameters of strings and enumerated values, such as the length of the string, or the number and class of enumerated values, are important parameters that may be used to determine the degree of similarity between fields. Further, the calculating the similarity indicator between the target field and any other field may include: and calculating a character string similarity index, a character string length similarity index, an enumerated value number similarity index and an enumerated value length similarity index of the target field and any other field.
Specifically, the character string similarity index may be calculated by using the following formula:
Figure BDA0002436974510000081
wherein s is1Indicating the string similarity index, sim indicating the number of identical strings in both fields (i.e. the target field and the any other field), short indicating the length of the string in the shorter of the two fields, long indicating the length of the string in the longer of the two fields, α being a hyper-parameter for controlling the impact of the string on similarity.
Figure BDA0002436974510000082
Has the effect of converting s1The compression is between 0 and 1. For example, there are two fields, respectively "monthly income after personal tax" and "personal income tax", then both s are calculated1When sim is 3 ("person", "tax"), short is 5, and long is 7.
The string length similarity index may be calculated using the following formula:
Figure BDA0002436974510000091
wherein s is2Indicating the string length similarity measure, short indicating the string length that the shorter of the two fields (i.e., the target field and the any one of the other fields) has, long indicating the string length that the longer of the two fields has, e.g., s for calculating the fields "personal tax and monthly income" and "occupation2To obtain
Figure BDA0002436974510000092
The enumerated value number similarity index may be calculated by the following formula:
Figure BDA0002436974510000093
wherein s is3And expressing the similarity index of the enumeration value numbers, wherein min expresses the enumeration value number of the field with less enumeration value number in the two fields, and max expresses the enumeration value number of the field with more enumeration value number in the two fields. For example, there are 6 enumerated values for the "professional" field (police, nurse, teacher, programmer, student, clerk) and 2 enumerated values for the "gender" field (male and female) in the field data sheet, both s3Is composed of
Figure BDA0002436974510000094
The enumerated value length similarity index may be calculated using the following formula:
Figure BDA0002436974510000095
wherein s is4And indicating the enumeration value length similarity index, wherein avg _ min indicates the average length of the enumeration values of the fields with shorter average lengths of the enumeration values in the two fields, and avg _ max indicates the average length of the enumeration values of the fields with longer average lengths of the enumeration values in the two fields. For example, if the average length of the enumerated values of the "occupation" field is (2+2+2+3+2+ 2)/6-2.17 and the average length of the enumerated values of the "sex" field is (1+ 1)/2-1, then s of both is 14Is composed of
Figure BDA0002436974510000096
Specifically, the calculating the similarity between the target field and the any one of the other fields according to the similarity index between the target field and the any one of the other fields may include:
calculating an average value or a weighted average value of the character string similarity index, the character string length similarity index, the enumerated value number similarity index and the enumerated value length similarity index as the similarity between the target field and any one of the other fields, such as the similarity between two fields
Figure BDA0002436974510000101
105. And selecting the field with the highest similarity in the other fields, and replacing the target field in the first question sentence to obtain the recommended second question sentence.
After the similarity between the target field and each other field except the target field in the field data table is obtained through calculation, the field with the highest similarity in each other field is selected, and the target field in the first question sentence is replaced to obtain a recommended second question sentence. For example, if the first question sentence is "how the average income of different jobs in shanghai is distributed", where "job" is a target field, and the field with the highest similarity to "job" in the field data table is "industry", the "industry" may be used to replace the "job" in the first question sentence, so as to obtain a second question sentence: "how well the average income of different industries in Shanghai is distributed". And finally, recommending the second question sentence to the user, and completing the process of question recommendation once.
After extracting each field of the input question sentence, the embodiment of the application compares each field with fields in a field data table constructed in advance one by one, finds out the same field in the extracted field and the field data table, and determines the same field as a target field; then, the similarity between the target field and each other field in the field data table is respectively calculated, the field with the highest similarity is found out, and the target field in the question sentence is replaced, so that the recommended question sentence is obtained. Compared with a conventional mode of randomly replacing keywords of the same type in sentences, the method and the device for generating the new question sentences more consistent with the expectation of the user can generate the new question sentences by comprehensively considering the similarity among the preset fields and replacing the fields in the original question sentences with the fields with the highest similarity, and accuracy of the questions recommended by the intelligent question-answering system is improved.
Referring to fig. 2, a second embodiment of a method for recommending a question based on field similarity calculation according to the embodiment of the present application includes:
201. acquiring an input first question sentence;
202. performing word segmentation processing on the first question sentence, and extracting each field contained in the first question sentence;
203. comparing the fields with fields in a field data table which is constructed in advance one by one, finding out the fields which are the same as the fields in the field data table, and determining the fields as target fields;
the steps 201-203 are the same as the steps 101-103, and the related description of the steps 101-103 can be referred to.
204. Searching all historical question sentences of the user who inputs the first question sentence;
after determining the target field, the server may obtain the historical question record of the user who inputs the first question sentence, and find all the historical question sentences of the user.
205. Constructing a co-occurrence matrix according to the historical question sentences, wherein the co-occurrence matrix records the times of the common occurrence of any two fields in the field data table in the same historical question sentences of the user;
then, a co-occurrence matrix is constructed according to the historical question sentences, and the co-occurrence matrix records the times of the common occurrence of any two fields in the field data table in the same historical question sentence of the user. For example, a co-occurrence matrix M constructed from the user's historical question statements is:
Figure BDA0002436974510000111
the co-occurrence matrix M corresponds to table 2 below:
TABLE 2
Co-occurrence matrix M Occupation of the world Sex Age (age) Personal income after tax and month Industry
Occupation of the world - 18 27 22 3
Sex 18 - 2 15 5
Age (age) 27 2 - 30 10
Personal income after tax and month 22 15 30 - 21
Industry 3 5 10 21 -
In table 2, the value corresponding to "gender" and "occupation" is 18, which indicates that the number of times that "gender" and "occupation" have been co-occurred in the same historical question sentence in all the historical question sentences of the user is 18. For example, all question sentences asked by the user, "relationship between different sex and occupation", "different occupation and sex unmarried proportion", …, "correlation between different sex and different occupation", and the like are stored in advance. In these questions, there are "occupation" and "gender", and if there are 18 such questions, the two are 18 times of co-occurrence.
206. Calculating the similarity between the target field and each other field except the target field in the field data table according to the co-occurrence matrix;
after the co-occurrence matrix is constructed, the similarity between the target field and each of the other fields in the field data table except the target field may be calculated according to the co-occurrence matrix.
Specifically, step 206 may include:
(1) extracting field vectors of the target fields and field vectors of each other field from the co-occurrence matrix respectively, wherein each element of the field vectors is the times of the common occurrence of the corresponding field and each field in the field data table in the same historical question sentence of the user;
(2) and respectively calculating cosine similarity between the field vector of the target field and the field vectors of each other field to obtain the similarity between the target field and each other field.
In the co-occurrence matrix, each field corresponds to a field vector, for example, the "professional" field vector is [0, 18, 27, 22, 3], the gender field vector is [18, 0, 2, 15, 5], that is, the row or column in which a field is located is taken from the co-occurrence matrix, which is the field vector of the field. After the field vectors are extracted, respectively calculating cosine similarity between the field vectors of the target field and the field vectors of each other field, namely obtaining the similarity between the target field and each other field. For example, if the target field is "professional", the similarity between the target field and some other field "gender" is equal to the cosine similarity of the vector [0, 18, 27, 22, 3] and the vector [18, 0, 2, 15, 5 ].
207. And selecting the field with the highest similarity in the other fields, and replacing the target field in the first question sentence to obtain the recommended second question sentence.
Step 207 is the same as step 105, and the related description of step 105 can be referred to.
After extracting each field of the input question sentence, the embodiment of the application compares each field with fields in a field data table constructed in advance one by one, finds out the same field in the extracted field and the field data table, and determines the same field as a target field; then, all historical question sentences input by the user are searched, a co-occurrence matrix is constructed, the similarity between the target field and each other field except the target field in the field data table is calculated according to the co-occurrence matrix, the field with the highest similarity is found, and the target field in the question sentences is replaced, so that the recommended question sentences are obtained. Compared with the first embodiment of the present application, this embodiment proposes a specific way of calculating the similarity between the target field and each of the other fields.
Referring to fig. 3, a third embodiment of a question recommendation method based on field similarity calculation in the embodiment of the present application includes:
301. acquiring an input first question sentence;
302. performing word segmentation processing on the first question sentence, and extracting each field contained in the first question sentence;
303. comparing the fields with fields in a field data table which is constructed in advance one by one, finding out the fields which are the same as the fields in the field data table, and determining the fields as target fields;
304. searching all historical question sentences of the user who inputs the first question sentence;
305. constructing a co-occurrence matrix according to the historical question sentences, wherein the co-occurrence matrix records the times of the common occurrence of any two fields in the field data table in the same historical question sentences of the user;
the steps 301-.
306. Determining a field with the most times which appears in the same historical question sentence of the user together with the target field in the field data table according to the co-occurrence matrix;
307. and selecting the field with the most times, and replacing the target field in the first question sentence to obtain a recommended third question sentence.
After the co-occurrence matrix is constructed, the field with the most times, which appears in the same historical question sentence of the user together with the target field, in the field data table can be determined according to the co-occurrence matrix, then the field with the most times is selected, and the target field in the first question sentence is replaced, so that the recommended third question sentence is obtained.
For example, if the first question sentence is "how the average income of different jobs in shanghai is distributed", where "job" is a target field, and the field having the largest number of co-occurrences with the field "job" in the co-occurrence matrix M is "age" (27 times), the "job" in the first question sentence may be replaced with "age", thereby obtaining a third question sentence: "how well the average income of Shanghai varies among ages".
After extracting each field of the input question sentence, the embodiment of the application compares each field with fields in a field data table constructed in advance one by one, finds out the same field in the extracted field and the field data table, and determines the same field as a target field; then, searching all historical question sentences input by the user and constructing a co-occurrence matrix; and determining a field with the most times which appears in the same historical question and sentence of the user together with the target field in the field data table according to the co-occurrence matrix, selecting the field with the most times, and replacing the target field in the first question and sentence to obtain a recommended third question and sentence. Compared with the second embodiment of the present application, this embodiment proposes a question sentence generation manner that also uses the co-occurrence matrix, but is different from the way of calculating the similarity between fields.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Corresponding to the problem recommendation method based on field similarity calculation described in the above embodiments, fig. 4 shows a block diagram of a problem recommendation device based on field similarity calculation provided in the embodiments of the present application, and for convenience of description, only the parts related to the embodiments of the present application are shown.
Referring to fig. 4, the apparatus includes:
a question acquiring module 401, configured to acquire an input first question sentence;
a word segmentation module 402, configured to perform word segmentation on the first question sentence, and extract each field included in the first question sentence;
a field comparison module 403, configured to compare the fields with fields in a field data table that is constructed in advance one by one, find out the same field that each field and the field data table have, and determine the field as a target field;
a field similarity calculation module 404, configured to calculate similarities between the target field and each of the other fields in the field data table except the target field;
and the question recommending module 405 is configured to select a field with the highest similarity from the other fields, and replace the target field in the first question sentence to obtain a recommended second question sentence.
Further, the field similarity calculation module may include:
a similarity index calculation unit, configured to calculate a similarity index between the target field and any one of the other fields by combining the character string and the enumerated value of the target field and the character string and the enumerated value of the any one of the other fields, where the similarity index is a parameter used for measuring a degree of similarity between the two fields;
and the first field similarity calculation unit is used for calculating and obtaining the similarity between the target field and any one of the other fields according to the similarity indexes of the target field and any one of the other fields.
Further, the similarity index calculation unit may specifically be configured to: calculating a character string similarity index, a character string length similarity index, an enumerated value number similarity index and an enumerated value length similarity index of the target field and any other field;
the first field similarity calculation unit may be specifically configured to: and calculating an average value or a weighted average value of the character string similarity index, the character string length similarity index, the enumerated value number similarity index and the enumerated value length similarity index as the similarity of the target field and any other field.
Further, the character string similarity index may be calculated by using the following formula:
Figure BDA0002436974510000151
wherein s is1Representing the character string similarity index, sim representing the number of identical character strings of the two fields, short representing the length of the character string of the field with shorter length of the two fields, long representing the length of the character string of the field with longer length of the two fields, α being a hyper-parameter for controlling the influence of the character string on the similarity;
the string length similarity index may be calculated using the following formula:
Figure BDA0002436974510000152
wherein s is2Representing the string length similarity index, short representing the length of the string of the shorter of the two fieldsLong indicates the length of the character string of the longer of the two fields;
the enumerated value number similarity index may be calculated by the following formula:
Figure BDA0002436974510000161
wherein s is3Representing the similarity index of the enumeration value numbers, wherein min represents the enumeration value number of a field with a smaller enumeration value number in the two fields, and max represents the enumeration value number of a field with a larger enumeration value number in the two fields;
the enumerated value length similarity index may be calculated using the following formula:
Figure BDA0002436974510000162
wherein s is4And indicating the enumeration value length similarity index, wherein avg _ min indicates the average length of the enumeration values of the fields with shorter average lengths of the enumeration values in the two fields, and avg _ max indicates the average length of the enumeration values of the fields with longer average lengths of the enumeration values in the two fields.
Further, the field similarity calculation module may include:
the historical sentence searching unit is used for searching all historical question sentences of the user who inputs the first question sentence;
a co-occurrence matrix construction unit, configured to construct a co-occurrence matrix according to the historical question statement, where the co-occurrence matrix records the number of times that any two fields in the field data table appear in the same historical question statement of the user;
and the second field similarity calculation unit is used for calculating the similarity between the target field and each other field according to the co-occurrence matrix.
Further, the second field similarity calculation unit may include:
a field vector extraction subunit, configured to extract, from the co-occurrence matrix, a field vector of the target field and a field vector of each of the other fields, where each element of the field vector is a number of times that a corresponding field and each field in the field data table appear in a same history question sentence of the user;
and the cosine similarity operator unit is used for respectively calculating cosine similarity between the field vector of the target field and the field vector of each other field to obtain the similarity between the target field and each other field.
Further, the field similarity calculation module may further include:
the field data table comprises a field data table, a field determination unit and a field generation unit, wherein the field data table comprises a field data table, a field generation unit and a field generation unit;
and the field replacing module is used for selecting the field with the most times, and replacing the target field in the first question sentence to obtain the recommended third question sentence.
An embodiment of the present application further provides a computer-readable storage medium, which stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, the method implements the steps of any one of the problem recommendation methods based on field similarity calculation, as shown in fig. 1 to 3.
The embodiment of the present application further provides a server, which includes a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer readable instructions to implement any one of the steps of the problem recommendation method based on field similarity calculation, as shown in fig. 1 to 3.
The embodiment of the present application further provides a computer program product, which when running on a server, causes the server to execute the steps of implementing any one of the problem recommendation methods based on field similarity calculation as shown in fig. 1 to 3.
Fig. 5 is a schematic diagram of a server according to an embodiment of the present application. As shown in fig. 5, the server 5 of this embodiment includes: a processor 50, a memory 51, and computer readable instructions 52 stored in said memory 51 and executable on said processor 50. The processor 50, when executing the computer readable instructions 52, implements the steps in the above-described embodiments of the method for problem recommendation based on field similarity calculation, such as the steps 101 to 105 shown in fig. 1. Alternatively, the processor 50, when executing the computer readable instructions 52, implements the functions of the modules/units in the above-mentioned device embodiments, such as the functions of the modules 401 to 405 shown in fig. 4.
Illustratively, the computer readable instructions 52 may be partitioned into one or more modules/units that are stored in the memory 51 and executed by the processor 50 to accomplish the present application. The one or more modules/units may be a series of computer-readable instruction segments capable of performing specific functions, which are used to describe the execution of the computer-readable instructions 52 in the server 5.
The server 5 may be a computing device such as a smart phone, a notebook, a palm computer, and a cloud server. The server 5 may include, but is not limited to, a processor 50, a memory 51. Those skilled in the art will appreciate that fig. 5 is merely an example of a server 5 and does not constitute a limitation of the server 5 and may include more or fewer components than shown, or some components in combination, or different components, e.g., the server 5 may also include input output devices, network access devices, buses, etc.
The Processor 50 may be a CentraL Processing Unit (CPU), other general purpose Processor, a DigitaL SignaL Processor (DSP), an AppLication Specific Integrated Circuit (ASIC), an off-the-shelf ProgrammabLe Gate Array (FPGA) or other ProgrammabLe logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 51 may be an internal storage unit of the server 5, such as a hard disk or a memory of the server 5. The memory 51 may also be an external storage device of the server 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure DigitaL (SD) Card, a FLash memory Card (FLash Card), or the like, provided on the server 5. Further, the memory 51 may also include both an internal storage unit and an external storage device of the server 5. The memory 51 is used to store the computer readable instructions and other programs and data required by the server. The memory 51 may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing apparatus/terminal apparatus, a recording medium, computer Memory, Read-Only Memory (ROM), random-access Memory (RAM), an electrical carrier signal, a telecommunications signal, and a software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A problem recommendation method based on field similarity calculation is characterized by comprising the following steps:
acquiring an input first question sentence;
performing word segmentation processing on the first question sentence, and extracting each field contained in the first question sentence;
comparing the fields with fields in a field data table which is constructed in advance one by one, finding out the fields which are the same as the fields in the field data table, and determining the fields as target fields;
respectively calculating the similarity between the target field and each other field except the target field in the field data table;
and selecting the field with the highest similarity in the other fields, and replacing the target field in the first question sentence to obtain the recommended second question sentence.
2. The question recommendation method of claim 1 wherein the similarity between the target field and any one of the other fields in the field data table is calculated by:
calculating a similarity index of the target field and any one of the other fields by combining the character string and the enumerated value of the target field and the character string and the enumerated value of any one of the other fields, wherein the similarity index is a parameter for measuring the similarity between the two fields;
and calculating the similarity between the target field and any one of the other fields according to the similarity indexes of the target field and any one of the other fields.
3. The question recommendation method of claim 2, wherein said calculating a similarity measure of the target field and the any one of the other fields comprises:
calculating a character string similarity index, a character string length similarity index, an enumerated value number similarity index and an enumerated value length similarity index of the target field and any other field;
the calculating the similarity between the target field and the any one other field according to the similarity index between the target field and the any one other field includes:
and calculating an average value or a weighted average value of the character string similarity index, the character string length similarity index, the enumerated value number similarity index and the enumerated value length similarity index as the similarity of the target field and any other field.
4. The question recommendation method of claim 3, wherein the string similarity index is calculated using the following formula:
Figure FDA0002436974500000021
wherein s is1Representing the character string similarity index, sim representing the number of identical character strings of the two fields, short representing the length of the character string of the field with shorter length of the two fields, long representing the length of the character string of the field with longer length of the two fields, α being a hyper-parameter for controlling the influence of the character string on the similarity;
the character string length similarity index is calculated by adopting the following formula:
Figure FDA0002436974500000022
wherein s is2Representing the character string length similarity index, wherein short represents the character string length of the field with the shorter length in the two fields, and long represents the character string length of the field with the longer length in the two fields;
the enumeration value number similarity index is calculated by adopting the following formula:
Figure FDA0002436974500000023
wherein s is3Expressing the similarity index of the enumeration value number, min expressing the enumeration value number of the field with less enumeration value number in the two fields, max expressing the enumeration value number of the field with less enumeration value number in the two fieldsEnumerated value numbers of fields with more enumerated values;
the enumerated value length similarity index is calculated by adopting the following formula:
wherein s is4And indicating the enumeration value length similarity index, wherein avg _ min indicates the average length of the enumeration values of the fields with shorter average lengths of the enumeration values in the two fields, and avg _ max indicates the average length of the enumeration values of the fields with longer average lengths of the enumeration values in the two fields.
5. The question recommendation method of claim 1, wherein said separately calculating the similarity between the target field and each of the other fields in the field data table except the target field comprises:
searching all historical question sentences of the user who inputs the first question sentence;
constructing a co-occurrence matrix according to the historical question sentences, wherein the co-occurrence matrix records the times of the common occurrence of any two fields in the field data table in the same historical question sentences of the user;
and calculating the similarity between the target field and each other field according to the co-occurrence matrix.
6. The question recommendation method of claim 5, wherein said calculating a similarity between the target field and the respective other fields according to the co-occurrence matrix comprises:
extracting field vectors of the target fields and field vectors of each other field from the co-occurrence matrix respectively, wherein each element of the field vectors is the times of the common occurrence of the corresponding field and each field in the field data table in the same historical question sentence of the user;
and respectively calculating cosine similarity between the field vector of the target field and the field vectors of each other field to obtain the similarity between the target field and each other field.
7. The question recommendation method according to claim 5 or 6, after constructing a co-occurrence matrix from the historical question sentences, further comprising:
determining a field with the most times which appears in the same historical question sentence of the user together with the target field in the field data table according to the co-occurrence matrix;
and selecting the field with the most times, and replacing the target field in the first question sentence to obtain a recommended third question sentence.
8. A question recommendation apparatus based on field similarity calculation, comprising:
the question acquisition module is used for acquiring an input first question sentence;
the word segmentation module is used for carrying out word segmentation on the first question sentence and extracting each field contained in the first question sentence;
the field comparison module is used for comparing each field with fields in a field data table which is constructed in advance one by one, finding out the same fields of each field and the field data table and determining the same fields as target fields;
the field similarity calculation module is used for calculating the similarity between the target field and each other field except the target field in the field data table;
and the question recommending module is used for selecting the field with the highest similarity in the other fields, and replacing the target field in the first question sentence to obtain a recommended second question sentence.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the question recommendation method as claimed in any one of claims 1 to 7.
10. A server comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the problem recommendation method according to any one of claims 1 to 7 when executing the computer program.
CN202010255040.6A 2020-04-02 2020-04-02 Question recommendation method and device based on field similarity calculation and server Pending CN111553151A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010255040.6A CN111553151A (en) 2020-04-02 2020-04-02 Question recommendation method and device based on field similarity calculation and server
PCT/CN2021/078031 WO2021196934A1 (en) 2020-04-02 2021-02-26 Question recommendation method and apparatus based on field similarity calculation, and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010255040.6A CN111553151A (en) 2020-04-02 2020-04-02 Question recommendation method and device based on field similarity calculation and server

Publications (1)

Publication Number Publication Date
CN111553151A true CN111553151A (en) 2020-08-18

Family

ID=72005557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010255040.6A Pending CN111553151A (en) 2020-04-02 2020-04-02 Question recommendation method and device based on field similarity calculation and server

Country Status (2)

Country Link
CN (1) CN111553151A (en)
WO (1) WO2021196934A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112148860A (en) * 2020-09-29 2020-12-29 中国银行股份有限公司 Question recommendation method and device for text robot
CN112417271A (en) * 2020-11-09 2021-02-26 杭州讯酷科技有限公司 Intelligent construction method of system with field recommendation
WO2021196934A1 (en) * 2020-04-02 2021-10-07 深圳壹账通智能科技有限公司 Question recommendation method and apparatus based on field similarity calculation, and server
CN113673252A (en) * 2021-08-12 2021-11-19 之江实验室 Automatic join recommendation method for data table based on field semantics

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114139523A (en) * 2021-11-25 2022-03-04 北京中交兴路信息科技有限公司 Name comparison method and device, electronic equipment and medium
CN114385623A (en) * 2021-11-30 2022-04-22 北京达佳互联信息技术有限公司 Data table acquisition method, device, apparatus, storage medium, and program product

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279486B (en) * 2013-04-24 2019-03-08 百度在线网络技术(北京)有限公司 It is a kind of that the method and apparatus of relevant search are provided
US10216802B2 (en) * 2015-09-28 2019-02-26 International Business Machines Corporation Presenting answers from concept-based representation of a topic oriented pipeline
CN106610972A (en) * 2015-10-21 2017-05-03 阿里巴巴集团控股有限公司 Query rewriting method and apparatus
CN109509010B (en) * 2017-09-15 2023-04-18 腾讯科技(北京)有限公司 Multimedia information processing method, terminal and storage medium
CN109147934B (en) * 2018-07-04 2023-04-11 平安科技(深圳)有限公司 Inquiry data recommendation method, device, computer equipment and storage medium
CN110162615B (en) * 2019-05-29 2021-08-24 北京市律典通科技有限公司 Intelligent question and answer method and device, electronic equipment and storage medium
CN111553151A (en) * 2020-04-02 2020-08-18 深圳壹账通智能科技有限公司 Question recommendation method and device based on field similarity calculation and server

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021196934A1 (en) * 2020-04-02 2021-10-07 深圳壹账通智能科技有限公司 Question recommendation method and apparatus based on field similarity calculation, and server
CN112148860A (en) * 2020-09-29 2020-12-29 中国银行股份有限公司 Question recommendation method and device for text robot
CN112148860B (en) * 2020-09-29 2024-05-24 中国银行股份有限公司 Question recommending method and device for text robot
CN112417271A (en) * 2020-11-09 2021-02-26 杭州讯酷科技有限公司 Intelligent construction method of system with field recommendation
CN112417271B (en) * 2020-11-09 2023-09-01 杭州讯酷科技有限公司 Intelligent system construction method with field recommendation
CN113673252A (en) * 2021-08-12 2021-11-19 之江实验室 Automatic join recommendation method for data table based on field semantics

Also Published As

Publication number Publication date
WO2021196934A1 (en) 2021-10-07

Similar Documents

Publication Publication Date Title
CN111553151A (en) Question recommendation method and device based on field similarity calculation and server
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
US20160275148A1 (en) Database query method and device
CN110377558B (en) Document query method, device, computer equipment and storage medium
US11861308B2 (en) Mapping natural language utterances to operations over a knowledge graph
CN110795524B (en) Main data mapping processing method and device, computer equipment and storage medium
CN111506721A (en) Question-answering system and construction method for domain knowledge graph
CN111428503B (en) Identification processing method and processing device for homonymous characters
CN110929498B (en) Method and device for calculating similarity of short text and readable storage medium
CN111291177A (en) Information processing method and device and computer storage medium
CN112580357A (en) Semantic parsing of natural language queries
WO2021012958A1 (en) Original text screening method, apparatus, device and computer-readable storage medium
CN110909532B (en) User name matching method and device, computer equipment and storage medium
JP5559750B2 (en) Advertisement processing apparatus, information processing system, and advertisement processing method
CN113704236A (en) Government affair system data quality evaluation method, device, terminal and storage medium
TWI603320B (en) Global spoken dialogue system
CN110717029A (en) Information processing method and system
CN111382246A (en) Text matching method, matching device and terminal
US11860876B1 (en) Systems and methods for integrating datasets
Sumesh et al. Natural Language Processing based Recommendation System for Courses
CN115437620B (en) Natural language programming method, device, equipment and storage medium
CN111259209B (en) User intention prediction method based on artificial intelligence, electronic device and storage medium
CN113204710A (en) Public opinion analysis method and device, terminal equipment and storage medium
CN117313846A (en) Knowledge graph completion data set construction method and device and electronic equipment
CN113468280A (en) Data cognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination