CN107391591B - Data processing method and device and server - Google Patents

Data processing method and device and server Download PDF

Info

Publication number
CN107391591B
CN107391591B CN201710507380.1A CN201710507380A CN107391591B CN 107391591 B CN107391591 B CN 107391591B CN 201710507380 A CN201710507380 A CN 201710507380A CN 107391591 B CN107391591 B CN 107391591B
Authority
CN
China
Prior art keywords
data
sentence
sentences
index data
synonym
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710507380.1A
Other languages
Chinese (zh)
Other versions
CN107391591A (en
Inventor
毛德峰
蒋锐滢
段希娜
黄鹏
彭玉军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201710507380.1A priority Critical patent/CN107391591B/en
Publication of CN107391591A publication Critical patent/CN107391591A/en
Application granted granted Critical
Publication of CN107391591B publication Critical patent/CN107391591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the specification provides a data processing method, a data processing device and a server. The method comprises the following steps: acquiring source index data which needs to be subjected to same semantic extension conversion processing; performing same-semantic expansion conversion processing on the source index data to obtain a plurality of candidate index data; then, the matching degrees between the plurality of candidate index data and the source index data are respectively calculated, and the candidate index data with the matching degree larger than or equal to a first preset matching threshold value with the source index data is determined to be the extended index data of the source index data.

Description

Data processing method and device and server
Technical Field
The embodiment of the specification relates to the technical field of internet, in particular to a data processing method, a data processing device and a server.
Background
In the internet era, transactions in people's daily lives are often handled on the internet. Users often have consultation problems or knowledge acquisition requirements during the transaction processing of the internet service system. In order to meet the needs of users, a service system often provides a service database including common data related to services and corresponding index data, so that users can find needed data based on search of the index data.
In the prior art, index data in a service database is often specific data written by corresponding service operators, but different word-using habits of different users cause that great word-using difference exists between the index data in the database and search index data of real users during searching. When a user needs to search a certain data from a database, even if the database has the data searched by the user, the accurate data cannot be matched due to the word difference between the search index data searched by the user and the index data in the database. Therefore, the database in the prior art has a single index data, which has a large limitation, and results in poor identification capability of the search index data of the user, and the data search requirement of the user cannot be met.
Disclosure of Invention
An object of the embodiments of the present specification is to provide a data processing method, an apparatus, and a server, which can reduce a difference in terms between index data in a database and search index data in actual application, ensure coverage of the index data in the database with the search index data in the actual application, improve an ability of identifying search index data of a user, and meet a data search requirement of the user.
The embodiment of the specification is realized by the following steps:
a method of data processing, comprising:
acquiring source index data;
performing same-semantic expansion conversion processing on the source index data to obtain a plurality of candidate index data;
respectively calculating the matching degrees between the candidate index data and the source index data, and determining the candidate index data with the matching degree between the candidate index data and the source index data being more than or equal to a first preset matching threshold value as the extended index data of the source index data.
The source index data acquisition module is used for acquiring source index data;
the same-semantic expansion conversion processing module is used for performing same-semantic expansion conversion processing on the source index data to obtain a plurality of candidate index data;
the matching degree calculation module is used for calculating the matching degrees between the candidate index data and the source index data respectively;
and the extended index data determining module is used for determining candidate index data with the matching degree between the candidate index data and the source index data being greater than or equal to a first preset matching threshold as the extended index data of the source index data.
A data processing server comprising a processor and a memory, the memory storing computer program instructions for execution by the processor, the computer program instructions comprising:
acquiring source index data;
performing same-semantic expansion conversion processing on the source index data to obtain a plurality of candidate index data;
respectively calculating the matching degrees between the candidate index data and the source index data, and determining the candidate index data with the matching degree between the candidate index data and the source index data being more than or equal to a first preset matching threshold value as the extended index data of the source index data.
As can be seen from the above, in the embodiments of the present specification, the same-semantic extended conversion processing is performed on the source index data of a certain acquired service data, so that a plurality of extended index data of the same semantic of the same service data can be obtained. When a subsequent user needs to search certain service data, the word use difference between the index data in the database and the search index data searched by the user can be reduced, the coverage rate of the index data in the database to the search index data in practical application is ensured, the identification capability of the search index data of the user can be improved, the accurate service data can be matched quickly and accurately, and the data search requirement of the user is met.
Drawings
In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the specification, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a data processing method provided herein;
FIG. 2 is a flow chart illustrating one embodiment of obtaining data for a sentence to be trained provided herein;
fig. 3 is a schematic structural diagram of an embodiment of a data processing apparatus provided in this specification.
Detailed Description
The embodiment of the specification provides a data processing method, a data processing device and a server.
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort shall fall within the protection scope of the present specification.
An embodiment of a data processing method is described below. FIG. 1 is a flow diagram illustrating one embodiment of a data processing method provided herein, which provides the method steps as described in the embodiments or flowcharts, but may include more or fewer steps based on routine or non-inventive labor. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. In actual implementation, the system or client product may execute sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures. Specifically, as shown in fig. 1, the method may include:
s102: source index data is obtained.
In practical applications, when a user needs to search for certain data on the internet, the user often inputs data that can be identified as key information of the data. For example, when a user needs to obtain an opening rule of service data, the user often inputs a "service name + opening rule" to perform a search. Accordingly, the internet service system may set index data capable of being used as a key information identifier of the service data for the service data, and generally, the index data may be in the form of a title or question data in a question and answer system. However, in practical applications, as an example of the above-mentioned provisioning rule for acquiring the service data, some users may input "service name + how to apply" to perform the search. At this time, if the "service name + provisioning rule" is used as the index data of the provisioning rule of the service data in the internet service system, the index data of the "service name + provisioning rule" cannot be hit when the user inputs "service name + how to apply". Here, although the "how to apply" and the "opening rule" are different expressions, they have the same meaning. Based on this, in one or more embodiments of the present specification, it is considered that the same-semantic expansion conversion processing may be performed on the initial index data "source index data" in combination to obtain a plurality of same-semantic index data of the same service data.
In one or more embodiments of the present specification, source index data that needs to be subjected to the same semantic extension conversion processing may be acquired. Specifically, the source index data may be index data to be converted with semantic extension. Typically, the internet service system will set up an index database for storing the source index data.
S104: and performing same-semantic expansion conversion processing on the source index data to obtain a plurality of candidate index data.
In one or more embodiments of the present specification, the same semantic expansion conversion processing may be performed on the obtained source index data to obtain a plurality of candidate index data. Specifically, the method may include:
synonymy conversion processing is carried out on the words and sentences in the source index data, and synonymy words and sentences of the words and sentences in the source index data are obtained;
and carrying out word and sentence combination processing on the synonym sentences to obtain a plurality of candidate index data.
Here, the synonymy conversion processing is performed on the source index data, specifically, the source index data is divided into words or phrases (sentences), and the words or phrases are converted into corresponding synonyms or phrases, for example, the source index data indicates "how much the amount of the return insurance claim is", and the source index data may be divided into "how much the amount of the return insurance claim is", and how much the amount of the return insurance claim is (words), how much the amount of the return insurance claim is (phrases); accordingly, synonyms that may result in return for insurance may include: freight insurance, refund insurance, and the like; synonyms for claims may include claims, payments, etc.; the synonymous phrases of what the amount is may include: how much money, what the cost is, what the money is, etc.
And then, performing term combination processing on the synonyms, specifically, combining the synonyms with the context in practical application, determining the position relationship between the synonyms and the context, and adjusting the positions between the terms to ensure semantic similarity between the determined candidate index data and the source index data. In combination with the above-mentioned example of the index data as "what the amount of the return insurance claim is", the probability of "how much money" appears after "the claim" is higher in the context of practical application, and accordingly, the positional relationship of "how much money" after "the claim" can be determined. Then, the position adjustment between words and sentences can take the above-mentioned "how much money is lost" as an example, the probability that money is present later in the claim is also high, and accordingly, the position adjustment of the above-mentioned "how much money is lost" can also result in "how much money is lost". Therefore, on the premise of ensuring the semantic similarity between the obtained candidate index data and the source index data, the same semantic expansion conversion processing can be better carried out on the source index data, and more candidate index data can be obtained.
In a specific embodiment, a predetermined synonymy transformation model may be adopted to implement synonymy transformation processing on the words and sentences in the source index data, so as to obtain operations of synonyms of the words and sentences in the source index data. Specifically, the source index data may be input into a predetermined synonym transformation model to perform synonym transformation processing, and words and sentences in the source index data may be transformed into synonym sentences, so as to obtain synonym sentences of the words and sentences in the source index data.
Here, the synonymy transformation model includes determining in the following manner:
acquiring sentence pair data to be trained;
and carrying out synonymy transformation training on the data of the sentence to be trained based on a first preset machine learning algorithm to obtain a synonymy transformation model.
Further, as shown in fig. 2, fig. 2 is a schematic flow chart of an embodiment of obtaining data of a sentence to be trained provided in this specification, and specifically, the data of the sentence to be trained in the synonymy transformation model determination process may be obtained by adopting the following method:
s210: and collecting marked sentence pair data, wherein the marked sentence pair data comprises a plurality of pairs of sentences marked with semantically equivalent or semantically inequivalent marked data.
Specifically, in one or more embodiments of the present specification, the marked sentence pair data passes through a large number of sentence pairs collected from the internet, so that the generality of the sentence pairs is ensured.
S220: and taking the sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data being more than or equal to a first preset threshold value as standard sentence pair data.
S230: performing sentence pair matching training on the standard sentence pair data based on a fourth preset machine learning algorithm to obtain a sentence pair matching model;
s240: and inputting the sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data of the sentence pair being more than or equal to a second preset threshold value into the sentence pair matching model to perform sentence pair matching processing, and determining the sentence pair with the matching degree between sentences in the sentence pair being more than or equal to the second preset matching threshold value as the data of the sentence pair to be trained.
Specifically, in one or more embodiments of the present specification, the first preset threshold and the second preset threshold may be set in combination with a requirement for a matching degree between sentences in a sentence pair in practical application. The first preset threshold is used for controlling the matching degree between standard sentences in the standard sentence pair data for sentence pair matching model training, the second preset threshold is used for controlling the matching degree between sentences in the sentence pair matching model for sentence pair matching, generally, the requirement on the matching degree between sentences in the sentence pair matching model is high, and correspondingly, the first preset threshold can be larger than the second preset threshold.
Specifically, the second preset matching threshold in one or more embodiments of the present specification may be set in combination with a requirement for a matching degree between sentences in data for a sentence to be trained in practical application.
In practical application, due to the characteristics of the service field or for the service which is just on line, part of vocabulary conversion information which is strongly related to the service cannot be obtained through the synonymy conversion model determined by the sentence pair with universality. Accordingly, in another embodiment, the method further comprises:
in the process of inputting the source index data into a predetermined synonymy conversion model for synonymy conversion processing, synonym sentences of service data of services corresponding to the source index data are also input into the synonymy conversion model;
correspondingly, the converting the words and sentences in the source index data into synonym sentences further includes:
and converting the words and sentences in the source index data into synonym sentences based on the synonym sentences of the service data.
Here, the synonym sentence of the service data of the service corresponding to the source index data is also input into the synonym conversion model, so that the synonym data amount in the synonym conversion model can be expanded, the synonym conversion processing of the sentence in the source index data can be better ensured, more synonym sentences can be obtained, and the coverage rate of subsequent index data can be further improved.
In a specific embodiment, a predetermined context word and sentence position recognition model and a word and sentence position adjustment model may be used to implement the operation of performing word and sentence combination processing on the synonym words and sentences to obtain a plurality of candidate index data. Specifically, the method may include:
inputting the synonyms and sentences into a predetermined context and sentence position recognition model to perform context and sentence position recognition processing, and determining the position relationship between the synonyms and sentences;
inputting the position relation between the synonym sentence and the synonym sentence into a predetermined sentence position adjusting model to perform sentence position adjusting processing, adjusting the position of the sentence in the synonym index data, and obtaining a plurality of candidate index data determined based on the position relation after the synonym sentence and the synonym sentence are adjusted;
wherein the context word and sentence position identification model is determined by adopting the following modes:
acquiring sentence pair data to be trained;
performing context word and sentence position recognition training on the data of the sentence to be trained based on a second preset machine learning algorithm to obtain a context word and sentence position recognition model;
the word and sentence position adjusting model is determined by adopting the following method:
acquiring sentence pair data to be trained;
and performing word and sentence position adjustment training on the data of the sentence to be trained based on a third preset machine learning algorithm to obtain a word and sentence position adjustment model.
In one or more embodiments of the present description, the manner of obtaining data of a sentence to be trained in the process of determining the context word and sentence position identification model and the word and sentence position adjustment model may refer to the manner of obtaining data of a sentence to be trained in the process of determining the synonymy conversion model, which is not described herein again.
In practical application, due to the characteristics of the service field or for the service which is just on line, the context word and sentence position identification model determined by the sentence pair with universality cannot contain some special service contexts. To compensate for the general context under-coverage, in another embodiment, the method may further include:
in the process of inputting the synonym sentence into a predetermined context sentence position identification model for carrying out context sentence position identification processing, service context data of a service corresponding to the source index data are also input into the context sentence position identification model;
correspondingly, the determining the position relationship between the synonym sentences further includes:
and determining the position relation between the synonyms and sentences based on the position relation between the sentences in the service context data.
The service context data of the service corresponding to the source index data is further input into the context word and sentence position identification model, so that the context in the synonym conversion model can be expanded, the position relation between synonyms can be more accurately determined, and the coverage rate of subsequent index data can be improved.
In particular, the business context data can include data that can reflect the logic at the transaction under the business. In a specific embodiment, taking "the provisioning rule of service a" as an example, the service context data may include the following data: at present, the service A is only opened for the real-name authentication user in the large Chinese land area temporarily, and the user in the non-large land area cannot use the service A temporarily. And advising that the shopping payment can be operated by using other payment modes, and thanking you for supporting the service A. You can log in the Paibao to try to open the service A at a computer end or a mobile phone end, and advise you to log in a Paibao mobile phone client and select My-service A to try to open. Warm reminding: if you can't open for a while, because the user group facing the current business A is gradually enlarged, the suggestion can try to increase the activity of the account and keep a good credit record, thereby improving the possibility that the system invites you.
Specifically, in one or more embodiments of the present disclosure, the first preset machine learning algorithm, the second preset machine learning algorithm, the third preset machine learning algorithm, and the fourth preset machine learning algorithm may be the same machine learning algorithm or different machine learning algorithms. The machine learning algorithm in one or more embodiments of the present description may include a convolutional neural network, a conventional neural network, a recursive neural network, a deep confidence network, or the like, but the one or more embodiments of the present description are not limited to the above. In a specific embodiment, taking the example that the synonymy transformation training is performed on the data of the sentence to be trained based on the convolutional neural network to obtain the synonymy transformation model, the method specifically includes the following steps:
inputting the sentence to be trained into a preset convolutional neural network for training;
and adjusting parameters of each layer in the convolutional neural network until the synonymy probability between the current output words and sentences of the convolutional neural network is greater than or equal to a preset probability value, and taking the convolutional neural network corresponding to the synonymy probability between the current output words and sentences as a synonymy conversion model.
Specifically, the preset probability value may be a preset value that can satisfy the accuracy of the synonymy judgment between words and sentences, and is set to 0.7, for example. Correspondingly, when the synonymy probability among the words and sentences currently output is larger than or equal to the preset probability value, the accuracy of the corresponding convolutional neural network for judging the synonymy among the words and sentences can reach 70% or more.
For one or more specific processes of training to obtain the context word and sentence position recognition model, the word and sentence position adjustment model, and the sentence pair matching model in this specification, reference may be made to the example of training to obtain the synonymy transformation model, which is not described herein again.
S106: respectively calculating the matching degrees between the candidate index data and the source index data, and determining the candidate index data with the matching degree between the candidate index data and the source index data being more than or equal to a first preset matching threshold value as the extended index data of the source index data.
Specifically, the matching degree in one or more embodiments of the present specification may include a specific value quantized by a preset rule, where the characterization of characters capable of reflecting the matching degree or trend between index data is quantized; in a general example, if the value of the tokenized token may be "middle" in a certain dimension, the token may be quantized to be a binary value or a hexadecimal value of the ASCII code thereof, and the matching degree between the index data according to one or more embodiments of the present disclosure is not limited to the above.
When the matching degree is larger, the matching degree between the candidate index data and the source index data is better, and the probability that the semantics of the candidate index data and the source index data are the same is larger; conversely, when the matching degree is smaller, the matching degree between the candidate index data and the source index data is worse, and the probability that the semantics of the candidate index data and the source index data are the same is smaller.
In one or more embodiments of the present disclosure, in order to reduce semantic differences between subsequently obtained extended index data and source index data, matching degrees between the multiple candidate index data and the source index data may be respectively calculated, and candidate index data whose matching degree with the source index data is greater than or equal to a first preset matching threshold is determined as the extended index data of the source index data, and candidate index data whose matching degree with the source index data is less than the first preset matching threshold is excluded, so as to ensure accuracy of the extended index data.
In practical applications, generally, the internet service system may set an extended index database for storing the extended index data. Of course, here, the extended index database and the index database for storing the source index data may be the same database or different databases.
Specifically, the first preset matching threshold according to one or more embodiments of the present disclosure may be set in combination with a requirement for a matching degree between the source index data and the candidate index data in practical application.
TABLE 1
Figure BDA0001334987880000081
In a specific embodiment, as shown in table 1, the same semantic expansion conversion processing is performed on the source index data "how much the decommissioning insurance claim settlement amount is" by using the technical solution provided in one or more embodiments of the present specification, so as to obtain a plurality of index data "expanded index data" of the same semantic meaning of the same service data.
Therefore, in one or more embodiments of a data processing method in this specification, multiple extended index data with the same semantics of the same service data can be obtained by performing the same-semantics extended conversion processing on the obtained source index data of a certain service data. When a subsequent user needs to search certain service data, the word use difference between the index data in the database and the search index data searched by the user can be reduced, the coverage rate of the index data in the database to the search index data in practical application is ensured, the identification capability of the search index data of the user can be improved, the accurate service data can be matched quickly and accurately, and the data search requirement of the user is met.
In another aspect of the present specification, a data processing apparatus is further provided, and fig. 3 is a schematic structural diagram of an embodiment of the data processing apparatus provided in the specification, and as shown in fig. 3, the apparatus 300 may include:
a source index data obtaining module 310, configured to obtain source index data;
the same-semantic-expansion conversion processing module 320 may be configured to perform same-semantic-expansion conversion processing on the source index data to obtain multiple candidate index data;
a matching degree calculation module 330, configured to calculate matching degrees between the candidate index data and the source index data respectively;
the extended index data determining module 340 may be configured to determine candidate index data having a matching degree with the source index data that is greater than or equal to a first preset matching threshold as extended index data of the source index data.
In another embodiment, the homosemantic extension conversion processing module 320 may include:
the synonymy conversion unit can be used for performing synonymy conversion processing on the words and sentences in the source index data to obtain synonymy words and sentences of the words and sentences in the source index data;
and the term combination unit can be used for carrying out term combination processing on the synonym terms to obtain a plurality of candidate index data.
In another embodiment, the synonym transformation unit may include:
the first data processing unit is used for inputting the source index data into a predetermined synonymy conversion model to perform synonymy conversion processing, converting words and sentences in the source index data into synonymy sentences, and obtaining synonymy sentences of the words and sentences in the source index data;
wherein the synonymy transformation model is determined by adopting the following method:
acquiring sentence pair data to be trained;
and carrying out synonymy transformation training on the data of the sentence to be trained based on a first preset machine learning algorithm to obtain a synonymy transformation model.
In another embodiment, the synonymy transformation unit may further include:
the second data processing unit may be configured to, in a process of inputting the source index data into a predetermined synonymy conversion model for synonymy conversion processing, further input a synonymy sentence of service data of a service corresponding to the source index data into the synonymy conversion model;
correspondingly, the converting, by the first data processing unit, the words and sentences in the source index data into synonyms may further include:
and converting the words and sentences in the source index data into synonym sentences based on the synonym sentences of the service data.
In another embodiment, the phrase combining unit may include:
a third data processing unit, configured to input the synonyms and sentences into a predetermined context and sentence position identification model to perform context and sentence position identification processing, and determine a position relationship between the synonyms and sentences;
a fourth data processing unit, configured to input the position relationship between the synonym sentence and the synonym sentence into a predetermined sentence position adjustment model to perform sentence position adjustment processing, and adjust a position of the sentence in the synonym index data to obtain a plurality of candidate index data determined based on the synonym sentence and the position relationship after the synonym sentence adjustment;
wherein the context word and sentence position identification model is determined by adopting the following modes:
acquiring sentence pair data to be trained;
performing context word and sentence position recognition training on the data of the sentence to be trained based on a second preset machine learning algorithm to obtain a context word and sentence position recognition model;
the word and sentence position adjusting model is determined by adopting the following method:
acquiring sentence pair data to be trained;
and performing word and sentence position adjustment training on the data of the sentence to be trained based on a third preset machine learning algorithm to obtain a word and sentence position adjustment model.
In another embodiment, the phrase combining unit may further include:
a fifth data processing unit, configured to, in a process of inputting the synonym into a predetermined context word and sentence position identification model to perform context word and sentence position identification processing, further input service context data of a service corresponding to the source index data into the context word and sentence position identification model;
correspondingly, the determining, by the third data processing unit, the position relationship between the synonym sentences further includes:
and determining the position relation between the synonyms and sentences based on the position relation between the sentences in the service context data.
In another embodiment, the sentence to be trained may include data obtained by:
collecting marked sentence pair data, wherein the marked sentence pair data comprises a plurality of pairs of sentences marked with semantically equivalent or semantically inequivalent marked data;
taking a sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data being more than or equal to a first preset threshold value as standard sentence pair data;
performing sentence pair matching training on the standard sentence pair data based on a fourth preset machine learning algorithm to obtain a sentence pair matching model;
and inputting the sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data of the sentence pair being more than or equal to a second preset threshold value into the sentence pair matching model to perform sentence pair matching processing, and determining the sentence pair with the matching degree between sentences in the sentence pair being more than or equal to the second preset matching threshold value as the data of the sentence pair to be trained.
Another aspect of the present specification also provides a data processing server comprising a processor and a memory, the memory storing computer program instructions executed by the processor, the computer program instructions may include:
acquiring source index data;
performing same-semantic expansion conversion processing on the source index data to obtain a plurality of candidate index data;
respectively calculating the matching degrees between the candidate index data and the source index data, and determining the candidate index data with the matching degree between the candidate index data and the source index data being more than or equal to a first preset matching threshold value as the extended index data of the source index data.
Specifically, in one or more embodiments of the present disclosure, the processor may include a Central Processing Unit (CPU), and may also include other single-chip microcomputers, logic gates, integrated circuits, and the like with logic processing capability, or a suitable combination thereof. The memory may include a non-volatile memory or the like.
In another embodiment, the performing the same semantic extension conversion processing on the source index data to obtain a plurality of candidate index data may include:
synonymy conversion processing is carried out on the words and sentences in the source index data, and synonymy words and sentences of the words and sentences in the source index data are obtained;
and carrying out word and sentence combination processing on the synonym sentences to obtain a plurality of candidate index data.
In another embodiment, the performing synonymy transformation processing on the words and phrases in the source index data to obtain synonymy sentences of the words and phrases in the source index data may include:
inputting the source index data into a predetermined synonymy conversion model for synonymy conversion processing, converting words and sentences in the source index data into synonymy sentences, and obtaining synonymy sentences of the words and sentences in the source index data;
wherein the synonymy transformation model is determined by adopting the following method:
acquiring sentence pair data to be trained;
and carrying out synonymy transformation training on the data of the sentence to be trained based on a first preset machine learning algorithm to obtain a synonymy transformation model.
In another embodiment, the computer program instructions may further comprise:
in the process of inputting the source index data into a predetermined synonymy conversion model for synonymy conversion processing, synonym sentences of service data of services corresponding to the source index data are also input into the synonymy conversion model;
correspondingly, the converting the words and sentences in the source index data into synonym sentences further includes:
and converting the words and sentences in the source index data into synonym sentences based on the synonym sentences of the service data.
In another embodiment, the performing a sentence combination process on the synonym sentence to obtain a plurality of candidate index data may include:
inputting the synonyms and sentences into a predetermined context and sentence position recognition model to perform context and sentence position recognition processing, and determining the position relationship between the synonyms and sentences;
inputting the position relation between the synonym sentence and the synonym sentence into a predetermined sentence position adjusting model to perform sentence position adjusting processing, adjusting the position of the sentence in the synonym index data, and obtaining a plurality of candidate index data determined based on the position relation after the synonym sentence and the synonym sentence are adjusted;
wherein the context word and sentence position identification model is determined by adopting the following modes:
acquiring sentence pair data to be trained;
performing context word and sentence position recognition training on the data of the sentence to be trained based on a second preset machine learning algorithm to obtain a context word and sentence position recognition model;
the word and sentence position adjusting model is determined by adopting the following method:
acquiring sentence pair data to be trained;
and performing word and sentence position adjustment training on the data of the sentence to be trained based on a third preset machine learning algorithm to obtain a word and sentence position adjustment model.
In another embodiment, the computer program instructions may further comprise:
in the process of inputting the synonym sentence into a predetermined context sentence position identification model for carrying out context sentence position identification processing, service context data of a service corresponding to the source index data are also input into the context sentence position identification model;
correspondingly, the determining the position relationship between the synonym sentences further includes:
and determining the position relation between the synonyms and sentences based on the position relation between the sentences in the service context data.
In another embodiment, the sentence to be trained may include data obtained by:
collecting marked sentence pair data, wherein the marked sentence pair data comprises a plurality of pairs of sentences marked with semantically equivalent or semantically inequivalent marked data;
taking a sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data being more than or equal to a first preset threshold value as standard sentence pair data;
performing sentence pair matching training on the standard sentence pair data based on a fourth preset machine learning algorithm to obtain a sentence pair matching model;
and inputting the sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data of the sentence pair being more than or equal to a second preset threshold value into the sentence pair matching model to perform sentence pair matching processing, and determining the sentence pair with the matching degree between sentences in the sentence pair being more than or equal to the second preset matching threshold value as the data of the sentence pair to be trained.
Therefore, in the embodiments of the data processing method, apparatus, or server in this specification, multiple extended index data with the same semantics of the same service data can be obtained by performing the same-semantics extended conversion processing on the source index data of the certain service data. When a subsequent user needs to search certain service data, the word use difference between the index data in the database and the search index data searched by the user can be reduced, the coverage rate of the index data in the database to the search index data in practical application is ensured, the identification capability of the search index data of the user can be improved, the accurate service data can be matched quickly and accurately, and the data search requirement of the user is met.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
In the 90 th generation of 20 th century, it is obvious that improvements in Hardware (for example, improvements in Circuit structures such as diodes, transistors and switches) or software (for improvement in method flow) can be distinguished for a technical improvement, however, as technology develops, many of the improvements in method flow today can be regarded as direct improvements in Hardware Circuit structures, designers almost all obtain corresponding Hardware Circuit structures by Programming the improved method flow into Hardware circuits, and therefore, it cannot be said that an improvement in method flow cannot be realized by Hardware entity modules, for example, Programmable logic devices (Programmable logic devices L organic devices, P L D) (for example, Field Programmable Gate Arrays (FPGAs) are integrated circuits whose logic functions are determined by user Programming of devices), and a digital system is "integrated" on a P L D "by self Programming of designers without requiring many kinds of integrated circuits manufactured and manufactured by special chip manufacturers to design and manufacture, and only a Hardware software is written in Hardware programs such as Hardware programs, software programs, such as Hardware programs, software, Hardware programs, software programs, Hardware programs, software, Hardware programs, software, Hardware programs, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software, Hardware, software.
A controller may be implemented in any suitable manner, e.g., in the form of, for example, a microprocessor or processor and a computer readable medium storing computer readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, Application Specific Integrated Circuits (ASICs), programmable logic controllers (PLC's) and embedded microcontrollers, examples of which include, but are not limited to, microcontrollers 625D, Atmel AT91SAM, Microchip PIC18F26K20 and Silicone L abs C8051F320, which may also be implemented as part of the control logic of a memory.
The apparatuses, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, apparatus or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The embodiments of this specification may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the device and server embodiments, since they are substantially similar to the method embodiments, the description is simple, and the relevant points can be referred to the partial description of the method embodiments.
The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and changes may occur to the embodiments described herein, as will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present specification should be included in the scope of the claims of the present application.

Claims (18)

1. A method of data processing, comprising:
acquiring source index data;
performing same-semantic expansion conversion processing on the source index data to obtain a plurality of candidate index data, including: synonymy conversion processing is carried out on the words and sentences in the source index data, and synonymy words and sentences of the words and sentences in the source index data are obtained; carrying out context word and sentence position identification processing on the synonym sentences to determine the position relation among the synonym sentences; carrying out word and sentence position adjustment processing on the position relation between the synonym sentences and the synonym sentences to obtain a plurality of candidate index data determined based on the synonym sentences and the position relation after the synonym sentences are adjusted;
respectively calculating the matching degrees between the candidate index data and the source index data, and determining the candidate index data with the matching degree between the candidate index data and the source index data being more than or equal to a first preset matching threshold value as the extended index data of the source index data.
2. The method according to claim 1, wherein the synonymy converting processing of the words and sentences in the source index data to obtain synonymy of the words and sentences in the source index data comprises:
inputting the source index data into a predetermined synonymy conversion model for synonymy conversion processing, converting words and sentences in the source index data into synonymy sentences, and obtaining synonymy sentences of the words and sentences in the source index data;
wherein the synonymy transformation model is determined by adopting the following method:
acquiring sentence pair data to be trained;
and carrying out synonymy transformation training on the data of the sentence to be trained based on a first preset machine learning algorithm to obtain a synonymy transformation model.
3. The method of claim 2, wherein the method further comprises:
in the process of inputting the source index data into a predetermined synonymy conversion model for synonymy conversion processing, synonym sentences of service data of services corresponding to the source index data are also input into the synonymy conversion model;
correspondingly, the converting the words and sentences in the source index data into synonym sentences further includes:
and converting the words and sentences in the source index data into synonym sentences based on the synonym sentences of the service data.
4. The method according to claim 1, wherein said subjecting said synonym sentences to a context sentence position identification process to determine a positional relationship between said synonym sentences comprises: inputting the synonyms and sentences into a predetermined context and sentence position recognition model to perform context and sentence position recognition processing, and determining the position relationship between the synonyms and sentences; wherein the context word and sentence position identification model is determined by adopting the following modes: acquiring sentence pair data to be trained; performing context word and sentence position recognition training on the data of the sentence to be trained based on a second preset machine learning algorithm to obtain a context word and sentence position recognition model;
the processing of adjusting the position of the synonym sentence and the position relation between the synonym sentences comprises: inputting the position relation between the synonym sentences and the synonym sentences into a predetermined sentence position adjusting model to perform sentence position adjusting processing, and adjusting the positions between the synonym sentences; wherein, the word and sentence position adjusting model is determined by adopting the following mode: acquiring sentence pair data to be trained; and performing word and sentence position adjustment training on the data of the sentence to be trained based on a third preset machine learning algorithm to obtain a word and sentence position adjustment model.
5. The method of claim 2, wherein the method further comprises:
in the process of inputting the synonym sentence into a predetermined context sentence position identification model for carrying out context sentence position identification processing, service context data of a service corresponding to the source index data are also input into the context sentence position identification model;
correspondingly, the determining the position relationship between the synonym sentences further includes:
and determining the position relation between the synonyms and sentences based on the position relation between the sentences in the service context data.
6. The method according to any one of claims 2 to 5, wherein the sentence to be trained comprises data obtained by:
collecting marked sentence pair data, wherein the marked sentence pair data comprises a plurality of pairs of sentences marked with semantically equivalent or semantically inequivalent marked data;
taking a sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data being more than or equal to a first preset threshold value as standard sentence pair data;
performing sentence pair matching training on the standard sentence pair data based on a fourth preset machine learning algorithm to obtain a sentence pair matching model;
and inputting the sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data of the sentence pair being more than or equal to a second preset threshold value into the sentence pair matching model to perform sentence pair matching processing, and determining the sentence pair with the matching degree between sentences in the sentence pair being more than or equal to the second preset matching threshold value as the data of the sentence pair to be trained.
7. A data processing apparatus comprising:
the source index data acquisition module is used for acquiring source index data;
the same-semantic expansion conversion processing module is used for performing same-semantic expansion conversion processing on the source index data to obtain a plurality of candidate index data, and comprises: synonymy conversion processing is carried out on the words and sentences in the source index data, and synonymy words and sentences of the words and sentences in the source index data are obtained; carrying out context word and sentence position identification processing on the synonym sentences to determine the position relation among the synonym sentences; carrying out word and sentence position adjustment processing on the position relation between the synonym sentences and the synonym sentences to obtain a plurality of candidate index data determined based on the synonym sentences and the position relation after the synonym sentences are adjusted;
the matching degree calculation module is used for calculating the matching degrees between the candidate index data and the source index data respectively;
and the extended index data determining module is used for determining candidate index data with the matching degree between the candidate index data and the source index data being greater than or equal to a first preset matching threshold as the extended index data of the source index data.
8. The apparatus of claim 7, wherein the homosemantic extension transformation processing module comprises:
the first data processing unit is used for inputting the source index data into a predetermined synonymy conversion model to perform synonymy conversion processing, converting words and sentences in the source index data into synonymy sentences, and obtaining synonymy sentences of the words and sentences in the source index data;
wherein the synonymy transformation model is determined by adopting the following method:
acquiring sentence pair data to be trained;
and carrying out synonymy transformation training on the data of the sentence to be trained based on a first preset machine learning algorithm to obtain a synonymy transformation model.
9. The apparatus of claim 8, wherein the homosemantic extension transformation processing module further comprises:
a second data processing unit, configured to, in a process of inputting the source index data into a predetermined synonymy conversion model for synonymy conversion processing, further input a synonymy sentence of service data of a service corresponding to the source index data into the synonymy conversion model;
correspondingly, the converting the words and sentences in the source index data into synonym sentences by the first data processing unit further includes:
and converting the words and sentences in the source index data into synonym sentences based on the synonym sentences of the service data.
10. The apparatus of claim 7, wherein the homosemantic extension transformation processing module comprises:
a third data processing unit, configured to input the synonyms and sentences into a predetermined context and sentence position identification model to perform context and sentence position identification processing, and determine a position relationship between the synonyms and sentences;
a fourth data processing unit, configured to input the position relationship between the synonym sentences and the synonym sentences into a predetermined sentence position adjustment model to perform sentence position adjustment processing, and adjust the positions between the synonym sentences to obtain a plurality of candidate index data determined based on the synonym sentences and the position relationship after the synonym sentences are adjusted;
wherein the context word and sentence position identification model is determined by adopting the following modes:
acquiring sentence pair data to be trained;
performing context word and sentence position recognition training on the data of the sentence to be trained based on a second preset machine learning algorithm to obtain a context word and sentence position recognition model;
the word and sentence position adjusting model is determined by adopting the following method:
acquiring sentence pair data to be trained;
and performing word and sentence position adjustment training on the data of the sentence to be trained based on a third preset machine learning algorithm to obtain a word and sentence position adjustment model.
11. The apparatus of claim 10, wherein the homosemantic extension transformation processing module further comprises:
a fifth data processing unit, configured to, in a process of inputting the synonym into a predetermined context word and sentence position identification model to perform context word and sentence position identification processing, further input service context data of a service corresponding to the source index data into the context word and sentence position identification model;
correspondingly, the determining, by the third data processing unit, the position relationship between the synonym sentences further includes:
and determining the position relation between the synonyms and sentences based on the position relation between the sentences in the service context data.
12. The apparatus according to any one of claims 8 to 11, wherein the sentence-to-be-trained data comprises data obtained by:
collecting marked sentence pair data, wherein the marked sentence pair data comprises a plurality of pairs of sentences marked with semantically equivalent or semantically inequivalent marked data;
taking a sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data being more than or equal to a first preset threshold value as standard sentence pair data;
performing sentence pair matching training on the standard sentence pair data based on a fourth preset machine learning algorithm to obtain a sentence pair matching model;
and inputting the sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data of the sentence pair being more than or equal to a second preset threshold value into the sentence pair matching model to perform sentence pair matching processing, and determining the sentence pair with the matching degree between sentences in the sentence pair being more than or equal to the second preset matching threshold value as the data of the sentence pair to be trained.
13. A data processing server comprising a processor and a memory, the memory storing computer program instructions for execution by the processor, the computer program instructions comprising:
acquiring source index data;
performing same-semantic expansion conversion processing on the source index data to obtain a plurality of candidate index data, including: synonymy conversion processing is carried out on the words and sentences in the source index data, and synonymy words and sentences of the words and sentences in the source index data are obtained; carrying out context word and sentence position identification processing on the synonym sentences to determine the position relation among the synonym sentences; carrying out word and sentence position adjustment processing on the position relation between the synonym sentences and the synonym sentences to obtain a plurality of candidate index data determined based on the synonym sentences and the position relation after the synonym sentences are adjusted;
respectively calculating the matching degrees between the candidate index data and the source index data, and determining the candidate index data with the matching degree between the candidate index data and the source index data being more than or equal to a first preset matching threshold value as the extended index data of the source index data.
14. The server according to claim 13, wherein the performing synonymy transformation processing on the words and phrases in the source index data to obtain synonymy of the words and phrases in the source index data includes:
inputting the source index data into a predetermined synonymy conversion model for synonymy conversion processing, converting words and sentences in the source index data into synonymy sentences, and obtaining synonymy sentences of the words and sentences in the source index data;
wherein the synonymy transformation model is determined by adopting the following method:
acquiring sentence pair data to be trained;
and carrying out synonymy transformation training on the data of the sentence to be trained based on a first preset machine learning algorithm to obtain a synonymy transformation model.
15. The server of claim 14, wherein the computer program instructions further comprise:
in the process of inputting the source index data into a predetermined synonymy conversion model for synonymy conversion processing, synonym sentences of service data of services corresponding to the source index data are also input into the synonymy conversion model;
correspondingly, the converting the words and sentences in the source index data into synonym sentences further includes:
and converting the words and sentences in the source index data into synonym sentences based on the synonym sentences of the service data.
16. The server according to claim 13, wherein said subjecting said synonym sentences to a context sentence position identification process to determine a positional relationship between said synonym sentences comprises: inputting the synonyms and sentences into a predetermined context and sentence position recognition model to perform context and sentence position recognition processing, and determining the position relationship between the synonyms and sentences; wherein the context word and sentence position identification model is determined by adopting the following modes: acquiring sentence pair data to be trained; performing context word and sentence position recognition training on the data of the sentence to be trained based on a second preset machine learning algorithm to obtain a context word and sentence position recognition model;
the processing of adjusting the position of the synonym sentence and the position relation between the synonym sentences comprises: inputting the position relation between the synonym sentences and the synonym sentences into a predetermined sentence position adjusting model to perform sentence position adjusting processing, adjusting the positions between the synonym sentences, and obtaining a plurality of candidate index data determined based on the position relation between the synonym sentences and the synonym sentences after adjustment; wherein, the word and sentence position adjusting model is determined by adopting the following mode: acquiring sentence pair data to be trained; and performing word and sentence position adjustment training on the data of the sentence to be trained based on a third preset machine learning algorithm to obtain a word and sentence position adjustment model.
17. The server of claim 16, wherein the computer program instructions further comprise:
in the process of inputting the synonym sentence into a predetermined context sentence position identification model for carrying out context sentence position identification processing, service context data of a service corresponding to the source index data are also input into the context sentence position identification model;
correspondingly, the determining the position relationship between the synonym sentences further includes:
and determining the position relation between the synonyms and sentences based on the position relation between the sentences in the service context data.
18. The server according to any one of claims 14 to 17, wherein the sentence-to-be-trained data comprises data obtained by:
collecting marked sentence pair data, wherein the marked sentence pair data comprises a plurality of pairs of sentences marked with semantically equivalent or semantically inequivalent marked data;
taking a sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data being more than or equal to a first preset threshold value as standard sentence pair data;
performing sentence pair matching training on the standard sentence pair data based on a fourth preset machine learning algorithm to obtain a sentence pair matching model;
and inputting the sentence pair with the ratio of the semantically equivalent annotation data of the sentence pair in the annotation sentence pair data to the total annotation data of the sentence pair being more than or equal to a second preset threshold value into the sentence pair matching model to perform sentence pair matching processing, and determining the sentence pair with the matching degree between sentences in the sentence pair being more than or equal to the second preset matching threshold value as the data of the sentence pair to be trained.
CN201710507380.1A 2017-06-28 2017-06-28 Data processing method and device and server Active CN107391591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710507380.1A CN107391591B (en) 2017-06-28 2017-06-28 Data processing method and device and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710507380.1A CN107391591B (en) 2017-06-28 2017-06-28 Data processing method and device and server

Publications (2)

Publication Number Publication Date
CN107391591A CN107391591A (en) 2017-11-24
CN107391591B true CN107391591B (en) 2020-08-04

Family

ID=60334077

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710507380.1A Active CN107391591B (en) 2017-06-28 2017-06-28 Data processing method and device and server

Country Status (1)

Country Link
CN (1) CN107391591B (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9135237B2 (en) * 2011-07-13 2015-09-15 Nuance Communications, Inc. System and a method for generating semantically similar sentences for building a robust SLM
CN103810218B (en) * 2012-11-14 2018-06-08 北京百度网讯科技有限公司 A kind of automatic question-answering method and device based on problem cluster
CN109241266B (en) * 2015-07-23 2020-09-11 上海智臻智能网络科技股份有限公司 Method and device for creating extended question based on standard question in man-machine interaction
CN105608199B (en) * 2015-12-25 2020-08-25 上海智臻智能网络科技股份有限公司 Extension method and device for standard questions in intelligent question-answering system

Also Published As

Publication number Publication date
CN107391591A (en) 2017-11-24

Similar Documents

Publication Publication Date Title
CN108920654B (en) Question and answer text semantic matching method and device
WO2019192261A1 (en) Payment mode recommendation method and device and equipment
US20180158078A1 (en) Computer device and method for predicting market demand of commodities
CN108733825B (en) Object trigger event prediction method and device
US20200285811A1 (en) Methods, apparatuses, and devices for generating word vectors
US11861315B2 (en) Continuous learning for natural-language understanding models for assistant systems
CN111708869B (en) Processing method and device for man-machine conversation
CN115033676B (en) Intention recognition model training and user intention recognition method and device
CN110033382B (en) Insurance service processing method, device and equipment
CN113688313A (en) Training method of prediction model, information pushing method and device
CN114817538B (en) Training method of text classification model, text classification method and related equipment
CN116049761A (en) Data processing method, device and equipment
CN112632254B (en) Dialogue state determining method, terminal device and storage medium
US20230351121A1 (en) Method and system for generating conversation flows
CN110688460B (en) Risk identification method and device, readable storage medium and electronic equipment
CN107391591B (en) Data processing method and device and server
CN114970559B (en) Intelligent response method and device
CN116028626A (en) Text matching method and device, storage medium and electronic equipment
CN112579774A (en) Model training method, model training device and terminal equipment
US11487938B2 (en) Methods and systems for improving language processing for ambiguous instances
CN109902170B (en) Text classification method and device and electronic equipment
CN112328755A (en) Question-answering system, question-answering robot and FAQ question-answering library recalling method thereof
CN111858899A (en) Statement processing method, device, system and medium
CN110866207B (en) Data processing method, apparatus and machine readable medium
CN114792256B (en) Crowd expansion method and device based on model selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201019

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: Greater Cayman, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.

TR01 Transfer of patent right