WO2021196934A1

WO2021196934A1 - Question recommendation method and apparatus based on field similarity calculation, and server

Info

Publication number: WO2021196934A1
Application number: PCT/CN2021/078031
Authority: WO
Inventors: 赵亮
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2020-04-02
Filing date: 2021-02-26
Publication date: 2021-10-07
Also published as: CN111553151A

Abstract

A question recommendation method and apparatus based on the field similarity calculation, and a server, which are suitable for the technical field of artificial intelligence. The question recommendation method comprises: obtaining an input first questioning statement (101); performing word segmentation processing on the first questioning statement to extract fields comprised therein (102); respectively comparing the fields with fields comprised in a pre-constructed field data table to find out the same field in the fields and the field data table, and determining the field as a target field (103); respectively calculating the similarity between the target field and each of other fields in the field data table other than the target field (104); and selecting a field having the highest similarity from the other fields, and replacing the target field in the first questioning statement with the field to obtain a recommended second questioning statement (105). By using the question recommendation method, a new question sentence more conforming to the expectation of a user can be generated, and the accuracy of question recommendation of an intelligent question answering system is improved.

Description

Problem recommendation method, device and server based on field similarity calculation

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 2, 2020, the application number is 202010255040.6, and the application name is "A method, device and server for recommending a problem based on field similarity calculation", all of which The content is incorporated in this application by reference.

Technical field

This application belongs to the field of artificial intelligence technology, and in particular relates to a problem recommendation method, device, storage medium, and server based on field similarity calculation.

Background technique

The working principle of an intelligent question answering system based on natural language is usually that the user enters a question sentence, the intelligent question answering system performs natural language processing on the question sentence, generates a structured query language, and then transfers the structured query language to the database or knowledge base according to the structured query language. Find the content of the reply, and finally return the query result to the user.

At present, there are two main question recommendation methods for intelligent question answering systems. One is real-time recommendation, that is, recommendation is based on the question currently input by the user; the other is similar question recommendation. In real-time recommendation, it is often triggered based on keywords. For example, when the user enters "by", an enumerated field name will be recommended; while in the recommendation of similar questions, it is the key to randomly replace the same type in the original question Words, so as to spell a new question. However, the inventor realizes that the problems of the above two methods of recommendation are often far from the user's expectations, and the accuracy of the problem recommendation is low.

Summary of the invention

In view of this, this application proposes a question recommendation method, device, storage medium and server based on field similarity calculation, which can improve the accuracy of the question recommendation of the intelligent question answering system.

In the first aspect, an embodiment of the present application provides a method for problem recommendation based on field similarity calculation, including:

Obtain the input first question sentence;

Perform word segmentation processing on the first question sentence, and extract various fields contained therein;

Compare each of the fields with the fields in the pre-built field data table one by one, find out the same fields that the various fields and the field data table have, and determine them as the target field;

Respectively calculating the similarity between the target field and each other field in the field data table except the target field;

The field with the highest similarity among the other fields is selected, and the target field in the first question sentence is replaced to obtain a recommended second question sentence.

In the second aspect, an embodiment of the present application provides a question recommendation device based on field similarity calculation, including:

The question acquisition module is used to acquire the input first question sentence;

The word segmentation module is used to perform word segmentation processing on the first question sentence and extract each field contained therein;

A field comparison module, which is used to compare each field one by one with the fields in the pre-built field data table, find out the same fields that each field and the field data table have, and determine it as a target field;

A field similarity calculation module, configured to calculate the similarity between the target field and each other field in the field data table except the target field;

The question recommendation module is configured to select the field with the highest similarity among the other fields, replace the target field in the first question sentence, and obtain a recommended second question sentence.

In the third aspect, an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, it implements what is proposed in the first aspect of the embodiments of the present application. The steps of the problem recommendation method.

In a fourth aspect, an embodiment of the present application provides a server, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor. The processor executes the computer program when the computer program is executed. Such as the steps of the problem recommendation method proposed in the first aspect of the embodiment of the present application.

In the fifth aspect, the embodiments of the present application provide a computer program product, which when the computer program product runs on a terminal device, causes the terminal device to execute the steps of the problem recommendation method described in the first aspect.

The question recommendation method based on field similarity calculation proposed in this application, after extracting each field of the input question sentence, each field will be compared with the fields in the pre-built field data table one by one to find out the extracted The field and the same field in the field data table are determined as the target field; then, the similarity between the target field and each other field in the field data table is calculated separately, and the field with the highest similarity is found, and the question statement Replace the target field in to get the recommended question. Compared with the conventional method of randomly replacing the same type of keywords in the sentence, this application comprehensively considers the similarity between each preset field, and replaces the field in the original question sentence with the field with the highest similarity, which can generate more New question sentences that meet user expectations and improve the accuracy of the intelligent question answering system's recommended questions.

Description of the drawings

FIG. 1 is a flowchart of a first embodiment of a problem recommendation method provided by an embodiment of the present application;

FIG. 2 is a flowchart of a second embodiment of a question recommendation method provided by an embodiment of the present application;

FIG. 3 is a flowchart of a third embodiment of a problem recommendation method provided by an embodiment of the present application;

FIG. 4 is a structural diagram of an embodiment of a problem recommendation device provided by an embodiment of the present application;

Fig. 5 is a schematic diagram of a server provided by an embodiment of the present application.

Detailed ways

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are proposed for a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to avoid unnecessary details from obstructing the description of this application. In addition, in the description of the specification of this application and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the description, and cannot be understood as indicating or implying relative importance.

This application proposes a question recommendation method, device, storage medium, and server, which can improve the accuracy of the question recommendation by the intelligent question answering system.

It should be understood that the subject of the question recommendation method based on field similarity calculation proposed in the various embodiments of the present application is various types of servers or terminal devices.

Referring to FIG. 1, a first embodiment of a method for recommending a question based on field similarity calculation in an embodiment of the present application includes:

101. Obtain the input first question sentence;

The user can input the question to be asked by voice input or manually on the terminal device, that is, the first question sentence, and the question sentence will be sent to the intelligent question answering system on the server side.

102. Perform word segmentation processing on the first question sentence, and extract various fields contained therein;

After the server obtains the question sentence, it will segment the question sentence and extract the various fields it contains. In word segmentation, various different types of word segmentation methods in the prior art can be used. For example, jieba word segmentation can be used. If the user asks: "What is the average age of men in different occupations?", after using jieba word segmentation, Get the field list["male", "different", "occupation", "average", "age", "how", "?"].

103. Compare the fields one by one with the fields in the pre-built field data table, find out the same fields that the fields and the field data table have, and determine them as target fields;

After the word segmentation obtains each field in the first question sentence, the server compares each field with the fields in the pre-built field data table one by one, and finds out the various fields and the fields in the field data table. The same field is determined as the target field.

The pre-built field data table can be as shown in Table 1 below:

Table 1

姓名Name	职业profession	性别gender	年龄age	个人税后月收入Personal monthly income after tax	行业industry
张三Zhang San	警察police	男male	3535	45004500	安保security
李四Li Si	服务员waiter	女Female	2929	40004000	服务service
……	……	……	……	……	……

In Table 1, "Name", "Occupation", "Gender", "Age", "Individual Monthly Income After Tax", and "Industry" are all fields in the data table of this field, "Zhang San", "Li Si" ", "waiter", "police", "male", "female", "security", "service", etc. are all enumerated values of fields. When constructing the field data table, write the above fields and enumeration values into the data structure. For example, in the python language, you can use the dict type to store the above data to form a dict type data structure table.

In addition, these fields can be added to jieba's custom dictionary , so that the keywords in these fields will not be cut when the question sentence entered by the user is segmented. For example, for the field keyword "personal monthly income after tax", jieba will cut it into 3 fields, "personal", "after tax", and "monthly income" by default. If you add "personal monthly income after tax" to In jieba's custom dictionary, jieba will not segment it.

Assuming that the various fields are list["male", "different", "occupation", "average", "age", "how", "?"], compare these fields with the fields in Table 1, Find out the same fields as "Occupation" and "Age" as the target field. It should be noted that there can be one or more target fields here.

104. Calculate the similarity between the target field and each other field in the field data table except the target field respectively;

After the target field is determined, the similarity between the target field and each other field in the field data table except the target field is calculated respectively. For example, in the example in Table 1 above, for the target field "Occupation", calculate the similarity between "Occupation" and "Name", the similarity between "Occupation" and "Gender", and the similarity between "Occupation" and "Age". , The similarity between "occupation" and "personal monthly income after tax" and the similarity between "occupation" and "industry".

Further, the similarity between the target field and any other field in the field data table can be calculated by the following steps:

(1) Combining the character string and enumerated value of the target field, and the character string and enumerated value of any other field to calculate the similarity index between the target field and any other field, The similarity index is a parameter used to measure the degree of similarity between two fields;

(2) According to the similarity index of the target field and the any other field, the similarity between the target field and the any other field is calculated.

Related attribute parameters of strings and enumerated values, such as the length of the string, or the number and category of enumerated values, are all important parameters that can be used to determine the degree of similarity between fields. Further, the calculating a similarity index between the target field and the any other field may include: calculating a string similarity index, a string length similarity index, and a string length similarity index between the target field and the any other field. The similarity index of the number of enumerated values and the similarity index of the length of the enumerated values.

Specifically, the string similarity index can be calculated using the following formula:

Wherein, s ₁ represents the string similarity index, sim represents the number of the same string in the two fields (that is, the target field and any other field), and short represents the shorter length of the two fields. The length of the string of the field, long represents the length of the string of the longer field in the two fields, and α is a hyperparameter used to control the impact of the string on the similarity.

The function of is to _{compress s 1} between 0 and 1. For example, if there are two fields, namely "personal monthly income after tax" and "personal income tax", when calculating s ₁ of both, sim = 3 ("person", "person", "tax"), short =5, long=7.

The string length similarity index can be calculated using the following formula:

Wherein, s ₂ represents the string length similarity index, short represents the string length of the shorter field in the two fields (that is, the target field and any other field), and long represents two fields The length of the string in the longer fields, such as calculating the s _{2 of the} fields "personal monthly income after tax" and "occupation", we get

The similarity index of the number of enumerated values can be calculated using the following formula:

Among them, s ₃ represents the similarity index of the number of enumerated values, min represents the number of enumeration values in the field with a small number of enumeration values in the two fields, and max represents the number of enumeration values in the two fields is large. The number of enumeration values that the field has. For example, there are 6 enumeration values for the "Occupation" field in the field data table (police, nurse, teacher, programmer, student, and staff), and there are 2 enumeration values for the "Gender" field (male and female), then The s ₃ of the two is

The length similarity index of the enumerated values can be calculated using the following formula:

Among them, s ₄ represents the length similarity index of the enumeration value, avg_min represents the average length of the enumeration value of the field with the shorter average length of the enumeration value in the two fields, and avg_max represents the longer average length of the enumeration value in the two fields The average length of the enumeration value of the field. For example, the average length of the enumeration value of the "Occupation" field is (2+2+2+3+2+2)/6=2.17, and the average length of the enumeration value of the "Gender" field is (1+1)/2=1 , Then the s ₄ of the two is

Specifically, the calculating the similarity between the target field and the any other field according to the similarity index between the target field and the any other field may include:

Calculate the average or weighted average of the string similarity index, the string length similarity index, the number similarity index of the enumeration value, and the length similarity index of the enumeration value, as the target The similarity between the field and any of the other fields, such as the similarity between two fields

105. Select the field with the highest similarity among the other fields, and replace the target field in the first question sentence to obtain a recommended second question sentence.

After calculating the similarity between the target field and each other field in the field data table except the target field, select the field with the highest similarity among the other fields, and compare the second The target field in a question sentence is replaced to obtain a recommended second question sentence. For example, the first question sentence is "How is the average income distribution of different occupations in Shanghai?", where "Occupation" is a target field, and the field with the highest similarity to the field "Occupation" in the data table of this field is "Industry", then You can replace the "occupation" in the first question sentence with "industry" to get the second question sentence: "How is the average income distribution in different industries in Shanghai?" Finally, recommend the second question sentence to the user to complete the process of a question recommendation.

After extracting each field of the input question sentence in the embodiment of the application, each field will be compared with the fields in the pre-built field data table one by one to find the extracted fields and the same fields in the field data table. , Determine the target field; then, calculate the similarity between the target field and each other field in the field data table, find the field with the highest similarity, replace the target field in the question sentence, and get the recommendation Question. Compared with the conventional method of randomly replacing the same type of keywords in the sentence, the embodiment of the application comprehensively considers the similarity between each preset field, and replaces the field in the original question sentence with the field with the highest similarity. Generate new question sentences that are more in line with user expectations and improve the accuracy of the intelligent question answering system to recommend questions.

Referring to FIG. 2, a second embodiment of a problem recommendation method based on field similarity calculation in an embodiment of the present application includes:

201. Obtain the input first question sentence;

202. Perform word segmentation processing on the first question sentence, and extract various fields contained therein;

203. Compare the respective fields one by one with the fields in the pre-built field data table, find out the same fields that the respective fields and the field data table have, and determine them as target fields;

Steps 201-203 are the same as steps 101-103. For details, please refer to the relevant descriptions of steps 101-103.

204. Search for all historical question sentences of the user who input the first question sentence;

After determining the target field, the server may obtain the historical question record of the user who input the first question sentence, and search for all historical question sentences of the user.

205. Construct a co-occurrence matrix according to the historical question sentence, the co-occurrence matrix records the number of times that any two fields in the field data table appear together in the same historical question sentence of the user;

Then, a co-occurrence matrix is constructed according to the historical question sentence, and the co-occurrence matrix records the number of times that any two fields in the field data table appear together in the same historical question sentence of the user. For example, a certain co-occurrence matrix M constructed based on the user's historical questioning sentence is:

The co-occurrence matrix M corresponds to the following Table 2:

Table 2

In Table 2, the value corresponding to "gender" and "occupation" is 18, which means that among all the historical question sentences of the user, the number of times that "gender" and "occupation" co-occur in the same historical question sentence is 18 . For example, pre-store all question sentences that users have asked, such as "relationship between different genders and occupations", "proportion of unmarried people in different occupations and genders", ..., "correlation between different genders and different occupations", etc. In these question sentences, there are both "occupation" and "gender". If there are 18 such question sentences, then the two "occupation" and "gender" appear together 18 times.

206. Calculate the similarity between the target field and each other field in the field data table except the target field according to the co-occurrence matrix;

After the co-occurrence matrix is constructed, the similarity between the target field and each other field in the field data table except the target field can be calculated according to the co-occurrence matrix.

Specifically, step 206 may include:

(1) Extract the field vector of the target field and the field vector of each of the other fields from the co-occurrence matrix. Each element of the field vector is the corresponding field and the field vector in the field data table. The number of times that each field appears in the same historical question sentence of the user;

(2) Calculate the cosine similarity between the field vector of the target field and the field vector of each of the other fields, respectively, to obtain the similarity between the target field and each of the other fields.

In the co-occurrence matrix, each field corresponds to a field vector. For example, the field vector for "occupation" is [0,18,27,22,3], and the field vector for gender is [18,0,2,15, 5], that is, extract the row or column of a field from the co-occurrence matrix, which is the field vector of the field. After the field vector is extracted, the cosine similarity between the field vector of the target field and the field vector of each of the other fields is calculated separately, that is, the similarity between the target field and each of the other fields is obtained . For example, if the target field is "occupation", the similarity between it and some other field "gender" is equal to the vector [0,18,27,22,3] and the vector [18,0,2,15,5] The cosine similarity of.

207. Select the field with the highest similarity among the other fields, and replace the target field in the first question sentence to obtain a recommended second question sentence.

Step 207 is the same as step 105. For details, please refer to the related description of step 105.

After extracting each field of the input question sentence in the embodiment of the application, each field will be compared with the fields in the pre-built field data table one by one to find the extracted fields and the same fields in the field data table. , Determined as the target field; then, search for all historical question sentences input by the user and construct a co-occurrence matrix, and calculate the distance between the target field and each other field in the field data table except the target field according to the co-occurrence matrix Find out the field with the highest similarity and replace the target field in the question sentence to obtain the recommended question sentence. Compared with the first embodiment of the present application, this embodiment proposes a specific method for calculating the similarity between the target field and each other field.

Referring to FIG. 3, a third embodiment of a problem recommendation method based on field similarity calculation in an embodiment of the present application includes:

301. Obtain the input first question sentence;

302. Perform word segmentation processing on the first question sentence, and extract various fields contained therein;

303. Compare the respective fields one by one with the fields in the pre-built field data table, find out the same fields that the respective fields and the field data table have, and determine them as target fields;

304. Search for all historical question sentences of the user who input the first question sentence;

305. Construct a co-occurrence matrix according to the historical question sentence, the co-occurrence matrix records the number of times that any two fields in the field data table appear together in the same historical question sentence of the user;

Steps 301-305 are the same as steps 201-205. For details, please refer to the relevant descriptions of steps 201-205.

306. Determine, according to the co-occurrence matrix, a field in the field data table that co-occurs with the target field in the same historical question sentence of the user the most frequently;

307. Select the field with the largest number of times, and replace the target field in the first question sentence to obtain a recommended third question sentence.

After the co-occurrence matrix is constructed, the field in the field data table that co-occurs with the target field in the same historical question sentence of the user can be determined according to the co-occurrence matrix, and then the field is selected. The field with the most times replaces the target field in the first question sentence to obtain the recommended third question sentence.

For example, the first question sentence is "How is the average income distribution of different occupations in Shanghai", where "occupation" is a target field, and in the co-occurrence matrix M, the field with the most co-occurrences with the field "occupation" is "age" "(27 times), then you can replace the "occupation" in the first question sentence with "age" to get the third question sentence: "How is the average income distribution of different ages in Shanghai?"

After extracting each field of the input question sentence in the embodiment of the application, each field will be compared with the fields in the pre-built field data table one by one to find the extracted fields and the same fields in the field data table. , Determine it as the target field; then, search for all historical question sentences input by the user and construct a co-occurrence matrix; determine according to the co-occurrence matrix that the field data table and the target field co-occur in the same history of the user The field with the most number of times in the question sentence is selected, and the field with the most times is selected, and the target field in the first question sentence is replaced to obtain the recommended third question sentence. Compared with the second embodiment of the present application, this embodiment proposes a question sentence generation method that also uses the co-occurrence matrix, but is different from calculating the similarity between fields.

It should be understood that the size of the sequence number of each step in the foregoing embodiment does not mean the order of execution. The execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present application.

Corresponding to the question recommendation method based on field similarity calculation described in the above embodiment, FIG. 4 shows a structural block diagram of a question recommendation device based on field similarity calculation provided by an embodiment of the present application. For ease of description, only The parts related to the embodiments of the present application are shown.

Referring to Figure 4, the device includes:

The question obtaining module 401 is used to obtain the input first question sentence;

The word segmentation module 402 is configured to perform word segmentation processing on the first question sentence and extract various fields contained therein;

The field comparison module 403 is configured to compare each field one by one with the fields in the pre-built field data table, find out the same fields that the various fields and the field data table have, and determine them as target fields;

The field similarity calculation module 404 is configured to calculate the similarity between the target field and each other field in the field data table except the target field;

The question recommendation module 405 is configured to select the field with the highest similarity among the other fields, replace the target field in the first question sentence, and obtain a recommended second question sentence.

Further, the field similarity calculation module may include:

The similarity index calculation unit is used to combine the string and enumeration value of the target field, and the string and enumeration value of any other field to calculate the similarity between the target field and any other field A degree index, where the similarity index is a parameter used to measure the degree of similarity between two fields;

The first field similarity calculation unit is configured to calculate the similarity between the target field and the any other field according to the similarity index between the target field and the any other field.

Further, the similarity index calculation unit may be specifically used to calculate a string similarity index, a string length similarity index, a number similarity index of enumerated values, and a string similarity index between the target field and any other field. Enumeration length similarity index;

The first field similarity calculation unit may be specifically used to calculate the string similarity index, the string length similarity index, the number of enumerated values similarity index, and the length of the enumerated values are similar. The average or weighted average of the degree indicators is used as the similarity between the target field and any other field.

Further, the string similarity index can be calculated using the following formula:

Among them, s ₁ represents the string similarity index, sim represents the number of identical strings in the two fields, short represents the length of the string in the shorter field of the two fields, and long represents the length of the string in the two fields. The length of the string of the longer field, α is a hyperparameter used to control the impact of the string on the similarity;

Wherein, s ₂ represents the string length similarity index, short represents the string length of the shorter field of the two fields, and long represents the string length of the longer field of the two fields;

Among them, s ₃ represents the similarity index of the number of enumerated values, min represents the number of enumeration values in the field with a small number of enumeration values in the two fields, and max represents the number of enumeration values in the two fields is large. The number of enumeration values that the field has;

Among them, s ₄ represents the length similarity index of the enumeration value, avg_min represents the average length of the enumeration value of the field with the shorter average length of the enumeration value in the two fields, and avg_max represents the longer average length of the enumeration value in the two fields The average length of the enumeration value of the field.

Further, the field similarity calculation module may include:

The historical sentence search unit is used to search for all historical question sentences of the user who input the first question sentence;

The co-occurrence matrix construction unit is configured to construct a co-occurrence matrix according to the historical question sentence, the co-occurrence matrix records the number of times any two fields in the field data table appear together in the same historical question sentence of the user ；

The second field similarity calculation unit is configured to calculate the similarity between the target field and the other fields according to the co-occurrence matrix.

Further, the second field similarity calculation unit may include:

The field vector extraction subunit is used to extract the field vector of the target field and the field vector of each of the other fields from the co-occurrence matrix. Each element of the field vector is the corresponding field and the field vector. The number of times that each field in the field data table appears together in the same historical question sentence of the user;

The cosine similarity calculation subunit is used to calculate the cosine similarity between the field vector of the target field and the field vector of each of the other fields to obtain the similarity between the target field and each of the other fields. Spend.

Further, the field similarity calculation module may further include:

A field determination unit with the highest frequency, configured to determine, according to the co-occurrence matrix, a field in the field data table that co-occurs with the target field in the same historical question sentence of the user the most frequently;

The field replacement module is used to select the field with the most frequency and replace the target field in the first question sentence to obtain the recommended third question sentence.

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and when the computer-readable instructions are executed by a processor, any one of those shown in FIGS. 1 to 3 is implemented. The steps of a problem recommendation method based on field similarity calculation. In addition, the computer-readable storage medium may be non-volatile or volatile.

An embodiment of the present application further provides a server, including a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor. The processor executes the computer-readable instructions when the computer-readable instructions are executed. Figures 1 to 3 show the steps of any question recommendation method based on field similarity calculation.

The embodiment of the present application also provides a computer program product, when the computer program product runs on a server, the server executes the steps of implementing any problem recommendation method based on field similarity calculation as shown in Figs. 1 to 3.

Fig. 5 is a schematic diagram of a server provided by an embodiment of the present application. As shown in FIG. 5, the server 5 of this embodiment includes a processor 50, a memory 51, and computer-readable instructions 52 stored in the memory 51 and running on the processor 50. When the processor 50 executes the computer-readable instructions 52, the steps in the above-mentioned problem recommendation method embodiments based on field similarity calculation, such as steps 101 to 105 shown in FIG. 1, are implemented. Alternatively, when the processor 50 executes the computer-readable instructions 52, the functions of the modules/units in the foregoing device embodiments, such as the functions of the modules 401 to 405 shown in FIG. 4, are implemented.

Exemplarily, the computer-readable instructions 52 may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 51 and executed by the processor 50, To complete this application. The one or more modules/units may be a series of computer-readable instruction segments capable of completing specific functions, and the instruction segments are used to describe the execution process of the computer-readable instructions 52 in the server 5.

The server 5 may be a computing device such as a smart phone, a notebook, a palmtop computer, and a cloud server. The server 5 may include, but is not limited to, a processor 50 and a memory 51. Those skilled in the art can understand that FIG. 5 is only an example of the server 5, and does not constitute a limitation on the server 5. It may include more or less components than those shown in the figure, or a combination of certain components, or different components, such as The server 5 may also include input and output devices, network access devices, buses, and the like.

The processor 50 may be a central processing unit (CentraL Processing Unit, CPU), or other general-purpose processors, digital signal processors (DigitaL Signal Processor, DSP), application specific integrated circuits (AppLication Specific Integrated Circuit, ASIC), Ready-made programmable gate array (FieLd-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.

The storage 51 may be an internal storage unit of the server 5, such as a hard disk or a memory of the server 5. The memory 51 may also be an external storage device of the server 5, such as a plug-in hard disk, a smart media card (SMC), or a secure digital (SD) card equipped on the server 5. Flash Card (FLash Card), etc. Further, the storage 51 may also include both an internal storage unit of the server 5 and an external storage device. The memory 51 is used to store the computer readable instructions and other programs and data required by the server. The memory 51 can also be used to temporarily store data that has been output or will be output.

It should be noted that the information interaction and execution process between the above-mentioned devices/units are based on the same concept as the method embodiment of this application, and its specific functions and technical effects can be found in the method embodiment section. I won't repeat it here.

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, only the division of the above functional units and modules is used as an example. In practical applications, the above functions can be allocated to different functional units and modules as needed. Module completion, that is, the internal structure of the device is divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiments can be integrated into one processing unit, or each unit can exist alone physically, or two or more units can be integrated into one unit. The above-mentioned integrated units can be hardware-based Formal realization can also be realized in the form of a software functional unit. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the protection scope of the present application. For the specific working process of the units and modules in the foregoing system, reference may be made to the corresponding process in the foregoing method embodiment, which will not be repeated here.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the implementation of all or part of the processes in the above-mentioned embodiment methods in the present application can be accomplished by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When executed by the processor, the steps of the foregoing method embodiments can be implemented. Wherein, the computer program includes computer program code, and the computer program code may be in the form of source code, object code, executable file, or some intermediate forms. The computer-readable medium may at least include: any entity or device capable of carrying the computer program code to the photographing device/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), and random access memory (RAM, Random Access Memory), electric carrier signal, telecommunications signal and software distribution medium. For example, U disk, mobile hard disk, floppy disk or CD-ROM, etc.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that it can still implement the foregoing The technical solutions recorded in the examples are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the application, and should be included in Within the scope of protection of this application.

Claims

A problem recommendation method based on field similarity calculation, which includes:

Obtain the input first question sentence;

Perform word segmentation processing on the first question sentence, and extract various fields contained therein;

Compare each of the fields with the fields in the pre-built field data table one by one, find out the same fields that the various fields and the field data table have, and determine them as the target field;

Respectively calculating the similarity between the target field and each other field in the field data table except the target field;

The field with the highest similarity among the other fields is selected, and the target field in the first question sentence is replaced to obtain a recommended second question sentence.
The question recommendation method according to claim 1, wherein the similarity between the target field and any other field in the field data table is calculated by the following steps:

Combining the string and enumeration value of the target field, and the string and enumeration value of any other field, calculate the similarity index between the target field and any other field, and the similarity index It is a parameter used to measure the degree of similarity between two fields;

According to the similarity index between the target field and the any other field, the similarity between the target field and the any other field is calculated.
3. The question recommendation method according to claim 2, wherein said calculating a similarity index between said target field and said any other field comprises:

Calculating a string similarity index, a string length similarity index, an enumerated value number similarity index, and an enumerated value length similarity index of the target field and any one of the other fields;

The calculating the similarity between the target field and the any other field according to the similarity index of the target field and the any other field includes:

Calculate the average or weighted average of the string similarity index, the string length similarity index, the number similarity index of the enumeration value, and the length similarity index of the enumeration value, as the target The similarity between the field and any of the other fields.
The question recommendation method according to claim 3, wherein the string similarity index is calculated using the following formula:

Among them, s 1 represents the string similarity index, sim represents the number of the same string in the two fields, short represents the string length of the shorter field in the two fields, and long represents the string length in the two fields. The length of the string of the longer field, α is a hyperparameter used to control the impact of the string on the similarity;

The string length similarity index is calculated using the following formula:

Wherein, s 2 represents the string length similarity index, short represents the string length of the shorter field of the two fields, and long represents the string length of the longer field of the two fields;

The similarity index of the number of enumerated values is calculated using the following formula:

Among them, s 3 represents the similarity index of the number of enumerated values, min represents the number of enumeration values in the field with a small number of enumeration values in the two fields, and max represents the number of enumeration values in the two fields is large. The number of enumeration values that the field has;

The length similarity index of the enumerated values is calculated using the following formula:

Among them, s 4 represents the length similarity index of the enumeration value, avg_min represents the average length of the enumeration value of the field with the shorter average length of the enumeration value in the two fields, and avg_max represents the longer average length of the enumeration value in the two fields The average length of the enumeration value of the field.
5. The question recommendation method according to claim 1, wherein said separately calculating the similarity between said target field and each other field in said field data table except for said target field comprises:

Search for all historical question sentences of the user who input the first question sentence;

Constructing a co-occurrence matrix according to the historical question sentence, the co-occurrence matrix records the number of times that any two fields in the field data table appear together in the same historical question sentence of the user;

Calculate the similarity between the target field and each of the other fields according to the co-occurrence matrix.
8. The question recommendation method according to claim 5, wherein said calculating the similarity between said target field and said various other fields according to said co-occurrence matrix comprises:

Extract the field vector of the target field and the field vector of each of the other fields from the co-occurrence matrix, each element of the field vector is the corresponding field and each field in the field data table is common The number of times that it appears in the same historical question sentence of the user;

The cosine similarity between the field vector of the target field and the field vector of each of the other fields is respectively calculated to obtain the similarity between the target field and each of the other fields.
The question recommendation method according to claim 5 or 6, wherein after constructing a co-occurrence matrix according to the historical question sentence, the method further comprises:

Determining, according to the co-occurrence matrix, a field in the field data table that co-occurs with the target field in the same historical question sentence of the user the most frequently;

The field with the most times is selected, and the target field in the first question sentence is replaced to obtain the recommended third question sentence.
A problem recommendation device based on field similarity calculation, which includes:

The question acquisition module is used to acquire the input first question sentence;

The word segmentation module is used to perform word segmentation processing on the first question sentence and extract each field contained therein;

A field comparison module, which is used to compare each field one by one with the fields in the pre-built field data table, find out the same fields that each field and the field data table have, and determine it as a target field;

A field similarity calculation module, configured to calculate the similarity between the target field and each other field in the field data table except the target field;

The question recommendation module is configured to select the field with the highest similarity among the other fields, replace the target field in the first question sentence, and obtain a recommended second question sentence.
A computer-readable storage medium storing a computer program, wherein the computer program is executed by a processor to implement the following steps:

Obtain the input first question sentence;

Perform word segmentation processing on the first question sentence, and extract various fields contained therein;

Compare each of the fields with the fields in the pre-built field data table one by one, find out the same fields that the various fields and the field data table have, and determine them as the target field;

Respectively calculating the similarity between the target field and each other field in the field data table except the target field;

The field with the highest similarity among the other fields is selected, and the target field in the first question sentence is replaced to obtain the recommended second question sentence.
9. The computer-readable storage medium of claim 9, wherein the similarity between the target field and any other field in the field data table is calculated by the following steps:

Combining the string and enumeration value of the target field, and the string and enumeration value of any other field, calculate the similarity index between the target field and any other field, and the similarity index It is a parameter used to measure the degree of similarity between two fields;

According to the similarity index between the target field and the any other field, the similarity between the target field and the any other field is calculated.
10. The computer-readable storage medium according to claim 10, wherein said calculating a similarity index between said target field and said any other field comprises:

Calculating a string similarity index, a string length similarity index, an enumerated value number similarity index, and an enumerated value length similarity index of the target field and any one of the other fields;

The calculating the similarity between the target field and the any other field according to the similarity index of the target field and the any other field includes:

Calculate the average or weighted average of the string similarity index, the string length similarity index, the number similarity index of the enumeration value, and the length similarity index of the enumeration value, as the target The similarity between the field and any of the other fields.
11. The computer-readable storage medium of claim 11, wherein the string similarity index is calculated using the following formula:

Among them, s 1 represents the string similarity index, sim represents the number of the same string in the two fields, short represents the string length of the shorter field in the two fields, and long represents the string length in the two fields. The length of the string of the longer field, α is a hyperparameter used to control the impact of the string on the similarity;

The string length similarity index is calculated using the following formula:

Wherein, s 2 represents the string length similarity index, short represents the string length of the shorter field of the two fields, and long represents the string length of the longer field of the two fields;

The similarity index of the number of enumerated values is calculated using the following formula:

Among them, s 3 represents the similarity index of the number of enumerated values, min represents the number of enumeration values in the field with a small number of enumeration values in the two fields, and max represents the number of enumeration values in the two fields is large. The number of enumeration values that the field has;

The length similarity index of the enumerated values is calculated using the following formula:

Among them, s 4 represents the length similarity index of the enumeration value, avg_min represents the average length of the enumeration value of the field with the shorter average length of the enumeration value in the two fields, and avg_max represents the longer average length of the enumeration value in the two fields The average length of the enumeration value of the field.
9. The computer-readable storage medium according to claim 9, wherein said separately calculating the similarity between the target field and each other field in the field data table except for the target field comprises:

Search for all historical question sentences of the user who input the first question sentence;

Constructing a co-occurrence matrix according to the historical question sentence, the co-occurrence matrix records the number of times that any two fields in the field data table appear together in the same historical question sentence of the user;

Calculate the similarity between the target field and each of the other fields according to the co-occurrence matrix.
15. The computer-readable storage medium according to claim 13, wherein the calculating the similarity between the target field and the various other fields according to the co-occurrence matrix comprises:

Extract the field vector of the target field and the field vector of each of the other fields from the co-occurrence matrix, each element of the field vector is the corresponding field and each field in the field data table is common The number of times that it appears in the same historical question sentence of the user;

The cosine similarity between the field vector of the target field and the field vector of each of the other fields is respectively calculated to obtain the similarity between the target field and each of the other fields.
A server includes a memory, a processor, and a computer program that is stored in the memory and can run on the processor, wherein the processor implements the following steps when the processor executes the computer program:

Obtain the input first question sentence;

Perform word segmentation processing on the first question sentence, and extract various fields contained therein;

Compare each of the fields with the fields in the pre-built field data table one by one, find out the same fields that the various fields and the field data table have, and determine them as the target field;

Respectively calculating the similarity between the target field and each other field in the field data table except the target field;

The field with the highest similarity among the other fields is selected, and the target field in the first question sentence is replaced to obtain a recommended second question sentence.
The server according to claim 15, wherein the similarity between the target field and any other field in the field data table is calculated by the following steps:

Combining the string and enumeration value of the target field, and the string and enumeration value of any other field, calculate the similarity index between the target field and any other field, and the similarity index It is a parameter used to measure the degree of similarity between two fields;

According to the similarity index between the target field and the any other field, the similarity between the target field and the any other field is calculated.
The server according to claim 16, wherein said calculating a similarity index between said target field and said any other field comprises:

Calculating a string similarity index, a string length similarity index, an enumerated value number similarity index, and an enumerated value length similarity index of the target field and any one of the other fields;

The calculating the similarity between the target field and the any other field according to the similarity index of the target field and the any other field includes:

Calculate the average or weighted average of the string similarity index, the string length similarity index, the number similarity index of the enumeration value, and the length similarity index of the enumeration value, as the target The similarity between the field and any of the other fields.
The server according to claim 17, wherein the string similarity index is calculated using the following formula:

Among them, s 1 represents the string similarity index, sim represents the number of the same string in the two fields, short represents the string length of the shorter field in the two fields, and long represents the string length in the two fields. The length of the string of the longer field, α is a hyperparameter used to control the impact of the string on the similarity;

The string length similarity index is calculated using the following formula:

Wherein, s 2 represents the string length similarity index, short represents the string length of the shorter field of the two fields, and long represents the string length of the longer field of the two fields;

The similarity index of the number of enumerated values is calculated using the following formula:

Among them, s 3 represents the similarity index of the number of enumerated values, min represents the number of enumeration values in the field with a small number of enumeration values in the two fields, and max represents the number of enumeration values in the two fields is large. The number of enumeration values that the field has;

The length similarity index of the enumerated values is calculated using the following formula:

Among them, s 4 represents the length similarity index of the enumeration value, avg_min represents the average length of the enumeration value of the field with the shorter average length of the enumeration value in the two fields, and avg_max represents the longer average length of the enumeration value in the two fields The average length of the enumeration value of the field.
The server according to claim 15, wherein said separately calculating the similarity between said target field and each other field in said field data table except for said target field comprises:

Search for all historical question sentences of the user who input the first question sentence;

Constructing a co-occurrence matrix according to the historical question sentence, the co-occurrence matrix records the number of times that any two fields in the field data table appear together in the same historical question sentence of the user;

Calculate the similarity between the target field and the other fields according to the co-occurrence matrix.
The server according to claim 19, wherein the calculating the similarity between the target field and the respective other fields according to the co-occurrence matrix comprises:

Extract the field vector of the target field and the field vector of each of the other fields from the co-occurrence matrix, each element of the field vector is the corresponding field and each field in the field data table is common The number of times that it appears in the same historical question sentence of the user;

The cosine similarity between the field vector of the target field and the field vector of each of the other fields is respectively calculated to obtain the similarity between the target field and each of the other fields.