CN113792195A

CN113792195A - Cross-system data acquisition method and device, computer equipment and storage medium

Info

Publication number: CN113792195A
Application number: CN202111345067.5A
Authority: CN
Inventors: 李锦珊; 时爱民; 叶俊锋; 杨刚; 王跃成; 斯媛; 郭红梅; 上官翔飞; 刘军; 潘蔚勇; 丁美刚; 严冲; 彭俊; 王祥
Original assignee: Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Current assignee: Taiping Financial Technology Services Shanghai Co Ltd Shenzhen Branch
Priority date: 2021-11-15
Filing date: 2021-11-15
Publication date: 2021-12-14
Anticipated expiration: 2041-11-15
Also published as: CN113792195B

Abstract

The present application relates to the field of computer technologies, and in particular, to a cross-system data acquisition method and apparatus, a computer device, and a storage medium. The method comprises the following steps: receiving a search request, wherein the search request carries search data corresponding to target data to be searched and object data of a search object; according to the search data and the object data, filtering the search area, and determining a target search area for data search, wherein the search area is constructed based on the original data of a plurality of original systems; based on the search data, performing data search on the target search area to obtain corresponding target index information; based on the target index information, corresponding target data is determined. By adopting the method, cross-system intelligent search and authority control can be realized.

Description

Cross-system data acquisition method and device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a cross-system data acquisition method and apparatus, a computer device, and a storage medium.

Background

With the development of knowledge economy, financial enterprises construct one-stop intelligent search and recommendation platforms in the enterprises, and the method is of great importance in promoting knowledge sharing and exerting the value of knowledge and information to the maximum extent. Compared with the traditional internet enterprises, the key point of the financial enterprises for constructing the platform is that strict authority control is required to be performed on the premise of meeting intelligent search in the whole using process.

Therefore, how to realize cross-system intelligent search and authority control is a problem which needs to be solved urgently at present.

Disclosure of Invention

Therefore, it is necessary to provide a cross-system data acquisition method, an apparatus, a computer device, and a storage medium, which are convenient and can implement cross-system intelligent search and authority control, in order to solve the above technical problems.

A cross-system data acquisition method, the method comprising: receiving a search request, wherein the search request carries search data corresponding to target data to be searched and object data of a search object; according to the search data and the object data, filtering the search area, and determining a target search area for data search, wherein the search area is constructed based on the original data of a plurality of original systems; based on the search data, performing data search on the target search area to obtain corresponding target index information; based on the target index information, corresponding target data is determined.

An inter-system data acquisition apparatus, the apparatus comprising: the search request receiving module is used for receiving a search request, wherein the search request carries search data corresponding to target data to be searched and object data of a search object; the target search area determining module is used for filtering the search area according to the search data and the object data and determining the target search area for data search, and the search area is constructed based on the original data of a plurality of original systems; the target index information determining module is used for carrying out data search on a target search area based on the search data to obtain corresponding target index information; and the target data determining module is used for determining corresponding target data based on the target index information.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the method of any of the above embodiments when the processor executes the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any of the above embodiments.

In the cross-system data acquisition method, the cross-system data acquisition device, the computer equipment and the storage medium, the search request is received, the search request carries search data corresponding to target data to be searched and target data of a search object, the search area is filtered according to the search data and the target data, the target search area for data search is determined, the search area is constructed based on the original data of a plurality of original systems, then data search is performed on the target search area based on the search data to obtain corresponding target index information, and further the corresponding target data is determined based on the target index information. Therefore, an index area for searching index information can be constructed through original data of multiple original systems, the problem of cross-system intelligent search is solved, and then search and graph-taking filtering and searching are carried out based on the search data and the object data, so that authority control can be carried out on a search object according to the object data during data search, the safety of data search is guaranteed, and the problem of authority control in cross-system search is solved.

Drawings

FIG. 1 is a diagram illustrating an exemplary implementation of a cross-system data acquisition method;

FIG. 2 is a schematic flow chart diagram illustrating a cross-system data acquisition method in one embodiment;

FIG. 3 is a diagram illustrating the acquisition of raw data of a raw system in one embodiment;

FIG. 4 is a schematic diagram of raw data obtained from a raw system in another embodiment;

FIG. 5 is a diagram illustrating the report registration to obtain raw data in one embodiment;

FIG. 6 is a diagram illustrating data synchronization based on a base table in one embodiment;

FIG. 7 is a schematic flow chart diagram illustrating the data searching step in one embodiment;

FIG. 8 is a schematic flow chart diagram illustrating the processing steps for searching data in one embodiment;

FIG. 9 is a diagram of search area filtering in one embodiment;

FIG. 10 is a schematic illustration of collaborative feature determination in one embodiment;

FIG. 11 is a schematic illustration of collaborative feature determination in another embodiment;

FIG. 12 is a schematic diagram illustrating a card generation process in one embodiment;

FIG. 13 is a block diagram of a data acquisition device across systems in one embodiment;

FIG. 14 is a diagram illustrating an internal structure of a computer device according to an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The cross-system data acquisition method provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. Among other things, the terminal 102 may interact with a search object and obtain a request to search for the object. The terminal 102 may generate a search request and send the search request to the server 104, where the search request carries search data corresponding to target data to be searched and object data of a search object. After receiving the search request, the server 104 may perform filtering of the search area according to the search data and the object data, and determine a target search area for performing data search, where the search area is constructed based on the original data of the plurality of original systems. Server 104 may then perform a data search on the target search area based on the search data to obtain corresponding target index information, and determine corresponding target data based on the target index information. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In one embodiment, as shown in fig. 2, a cross-system data acquisition method is provided, which is described by taking the method as an example applied to the server in fig. 1, and includes the following steps:

step S202, receiving a search request, wherein the search request carries search data corresponding to target data to be searched and object data of a search object.

The search request refers to a request for requesting data search, and the search request may carry search data corresponding to target data to be searched and object data of a search object.

For the financial industry, the target data to be searched can refer to financial product data, financial product data and the like.

The object data refers to data related to a login account of a search object in the search system, and may include, but is not limited to, registration information at the time of registration, such as name, gender, birth year, month, day, contact information, residence information, and the like, and may further include organization hierarchy information, such as department, position, and the like in an enterprise.

In this embodiment, the search object may enter the search system based on the terminal and input search data. And the terminal generates a search request based on the search data and the object data and sends the search request to the server so as to enable the server to perform subsequent processing.

And step S204, filtering the search area according to the search data and the object data, and determining a target search area for data search, wherein the search area is constructed based on the original data of a plurality of original systems.

The search area refers to an area that is constructed based on multiple original systems and used for data search, and the search area may include data related to original data corresponding to the multiple original systems, such as index data.

In this embodiment, the search area may include a plurality of databases, such as an index database, a vector database, a relational chart database, and the like, and each database may be composed of a plurality of data tables, which is not limited in this application.

In a financial enterprise, an original system may refer to systems respectively corresponding to different business functions and different business requirements, and may include, but not limited to, a report system, a non-report system, a training system, and various open websites, OA systems, data platforms, knowledge bases, and technology platforms, for example. It will be understood by those skilled in the art that this is merely an example, and that in practical applications, the original system may also include many different systems, which are not limited by the present application.

The raw data refers to data obtained from various raw systems, such as data crawled from various websites, or data obtained from various databases and data platforms.

In this embodiment, the server may construct a search area for searching based on raw data acquired from each raw system.

Further, the server may perform filtering of the search area based on the search data and the object data, for example, filtering out some databases that obviously do not correspond to the search data, or filtering out databases that the search object cannot access or has no access right, and the like, to obtain the object data corresponding to the search object and the target search area of the search data.

One point to be described is that in the present application, filtering of search areas is performed according to search data and object data to perform authority control to ensure secure access, and in addition, the search areas are reduced to improve efficiency, where in order to implement authority control, on the one hand, when a search area is generated, authorities corresponding to each search area, such as an organization or a user list, are defined, and for distinction, the authority is referred to herein as a first authority range; in addition, in the searching and recommending process, the corresponding second authority range is inquired through the object data, and the authority is filtered in the searching and recommending process so as to realize the authority control.

More specifically, in the searching and recommending process, the server acquires a second authority range according to the object data, and matches the second authority range with the authority range of each searching area to obtain a target searching area, wherein in order to ensure the accuracy of the authority, after the target searching area is displayed and the level is displayed for the user, when the user clicks details, the server also calls the data authority in the original system corresponding to the searching area to check the second authority range, so that the condition that the authority in the original system is not changed and is not updated to the server in time is avoided, and the safety and controllability of the final access of the information are ensured.

In one embodiment, the target search area may also be completely consistent with the search area, i.e., no information is filtered out while filtering.

Step S206, based on the search data, data search is carried out on the target search area, and corresponding target index information is obtained.

The index information refers to an index corresponding to specific data content, and for example, for a certain article, the index information may include information such as keywords of the subject of the article, keywords of the content, and the like.

In this embodiment, after the service performs filtering of the search area based on the search data and the object data, the server may perform data search on the filtered target search area based on the search data.

Specifically, the server may perform data search in the target search area based on a keyword, semantic recognition, and the like to find target index information related to the search data.

Step S208, based on the target index information, corresponding target data is determined.

Specifically, the server may obtain corresponding target data according to the target index information, for example, continuing to take an article as an example, the server may obtain an article related to a topic keyword found in the target search area according to the topic keyword or a data keyword, and the like, to obtain the target data.

In this embodiment, the target data acquired by the server may be multiple target data, or the acquired target data may be only data such as abstract information and brief descriptions of articles, and the server may trigger to display complete data content based on selection of the user, which is not limited in this application.

In the cross-system data acquisition method, a search request is received, the search request carries search data corresponding to target data to be searched and target data of a search object, the search area is filtered according to the search data and the target data, the target search area for data search is determined, the search area is constructed based on original data of a plurality of original systems, then data search is carried out on the target search area based on the search data to obtain corresponding target index information, and further the corresponding target data is determined based on the target index information. Therefore, an index area for searching index information can be constructed through original data of multiple original systems, the problem of cross-system intelligent search is solved, and then search and graph-taking filtering and searching are carried out based on the search data and the object data, so that authority control can be carried out on a search object according to the object data during data search, the safety of data search is guaranteed, and the problem of authority control in cross-system search is solved.

In one embodiment, the method may further include: acquiring each original data of each original system; and constructing a feature library corresponding to the original data based on each original data.

In this embodiment, the feature library may include at least one of an index library, a vector library, and a relational library.

Specifically, the number of internal systems of a financial enterprise is generally several tens to several hundreds, and the service functions of the systems are different from each other, and the types of the systems are also different from each other.

In this embodiment, the server may classify each original system, for example, from the perspective of system positioning and user group of each original system, and the classified categories include a website, an OA, a report system, a non-report system, a data platform, a training system, a knowledge base, a technology platform, and the like, as shown in fig. 3.

Further, the server may obtain corresponding raw data from each raw system, and construct a feature library to generate the search area for data search described above.

In this embodiment, the filtering the search area according to the search data and the object data, and determining the target search area for performing the data search may include: and filtering the search area of the feature library according to the search data and the object data to determine a target search area.

In this embodiment, the filtering of the search area by the server may specifically be filtering the constructed feature library to determine the target search area from the constructed feature library.

In one embodiment, obtaining raw data of each raw system may include: determining a data acquisition mode corresponding to each original system based on system parameters of each original system, wherein the data acquisition mode comprises at least one of data capture, interface acquisition and function registration acquisition; and acquiring corresponding original data from each original system according to the determined data acquisition modes.

In this embodiment, the server classifies each original system based on system parameters of each original system, such as system positioning, user groups, and the like, and after classifying each original system, may formulate a corresponding information synchronization scheme for each type of original system, that is, a scheme for obtaining original data, for example, with reference to fig. 3, the original data may be obtained through at least one of data capture, interface acquisition, and function registration acquisition.

In this embodiment, data crawling is suitable for crawling public class information, for example, for a public class information system such as OA, a server may extract key features and save the key features to the local.

Specifically, the server may automatically capture original information from the original system through Robot Process Automation (RPA) or other capture tools, and generate corresponding tags, vectors, or relationships.

In one embodiment, the server may obtain corresponding raw data from each raw system through the docking interface. Specifically, the server acquires data of a third-party system (original system) such as texts, pictures, videos, attachments and the like through an interface mode, and returns data with different dimensions according to different access parameters of the docking interface. Referring to fig. 4, the interface may return a web page snapshot, then store in the original page database, and decide only to do subsequent processing or discard.

In this embodiment, when the server acquires data of the original system through the docking interface, list (full amount/increment) and detail (single/multiple pieces according to ID) information acquisition may be supported to obtain corresponding original data.

In one embodiment, the interface parameter related data may be as shown in the following table.

Watch 1

In one embodiment, the server may obtain the original data of the original system by means of function registration.

Specifically, when the service interface or the statistical form of the original system can be registered by means of functional registration, the server can acquire the original data by means of functional registration acquisition. The following description will be made by taking a report as an example with reference to fig. 5.

Specifically, the report registration acquisition may include three steps, namely, report registration, report management, and report monitoring.

In this embodiment, the server may provide a report registration function interface of the original system, and the original system designates an administrator to complete report function registration. The attribute information to be registered may include a title, query parameters (field name, type, enumerated value), output dimensions and indexes, a URL (Uniform Resource Locator), an update frequency (day/week/month/quarter/year), report generation duration, a function description, a display mode of a clear page (description information of a report or a customized card), and the like.

Further, the server can support an administrator to change the report information, such as attribute information, control of whether the report information is used or not, and support the administrator to label the report by self and provide reference for accurate search and recommendation. And automatically capturing snapshots in advance for reports with fixed update frequency and long generation time.

Further, the server can automatically monitor the report catalog of the original system, and once a new report or an original report is found to be changed, the server automatically extracts the attribute information of the new report or the changed report, and assists a manager to complete the change of the report information.

In this embodiment, in order to ensure the intelligence level and the interactive response efficiency of the server, after the server acquires the original data of the original system, the server may further record the flow synchronized with the external system information by establishing an information acquisition reference table, i.e., a base table. As shown in fig. 6, the server acquires raw data from each raw system, generates each data through the base table, and stores each data in each database. Namely, information labeling processing in the data warehousing process: and synchronously marking corresponding labels for the original system data in the warehousing process of stock data and incremental data based on a label system and a label calculation rule so as to realize incremental processing.

In one embodiment, the database storage may include Index storage and/or vector storage, for example, in the Index storage, to WEB _ TPJK (Index library) - > NEWS (NEWS type table) - > id:8a84830c78f97a3101791cd55afa5e3 d. The data detail is { "title": easily enjoy XX2.0, create a group one-stop mobile intelligent office platform "," organization ": 1026", "publichtime": 2021-04-29"," summary ":.", "content": < p > in China XX. When the vector is stored, the storage positions are as follows: WEB _ TPJK (database) - > NEWS (News Table space). The content detail is as follows: article level vectors (vector generated from title, keyword, abstract): id 8a84830c78f97a3101791cd55afa5e3 d. Detail: { organization: "1026", val: [3.63411546e-01, -2.62404308 e-02., 3.96325350e-01, 2.66762227 e-01., }.

In one embodiment, constructing a feature library corresponding to each raw data based on each raw data may include: carrying out data transverse classification and/or longitudinal classification on each original data to obtain each classified original data; the horizontal classification is classified according to the hierarchical label of the original data, and the vertical classification is classified according to the data characteristics of the original data; performing data preprocessing on each classified original data to generate at least one of corresponding index data, vector data and relationship data; and constructing a feature library corresponding to the original data based on at least one of the index data, the vector data and the relation data.

The vector data refers to the corresponding data content expressed by a vector mode. The feature data may include keywords, key information, and the like. Relationship data may refer to associations between original content.

In this embodiment, after the server acquires raw data from each raw system, the server may perform normalization conversion processing on the acquired raw data.

In this embodiment, the server may perform the normalization conversion processing on the acquired raw data in two parts. In the first part, the server can perform standard conversion of text content on the acquired original data, such as deleting unknown symbols, messy codes, mars and emoticons in the original data; the punctuation marks in the original data are normalized and uniformly converted into Chinese punctuation marks; converting English letters in the original data into lower case; and converting the traditional Chinese characters in the original data into simplified Chinese characters. In the second part, the server may perform format conversion processing on the acquired raw data, such as generating a document from a text, converting a sentence by a line, and the like.

Further, the server may classify different types of raw data, such as texts, pictures, web pages, and the like, and formulate a dedicated analysis adapter to perform data preprocessing to generate corresponding index data, vector data, and relationship data. Since the penetrated information has systematic characteristics such as a subordinate mechanism, a subordinate business field, a specific user group and an application scene, a hierarchical label structure is transversely constructed in the construction process of the index data, namely, a hierarchical label is constructed according to the systematic characteristics of the penetrated information such as the subordinate mechanism, the subordinate business field, the specific user group and the application scene, a specific characteristic label is longitudinally planned, namely, the characteristic label is constructed according to the data characteristics of the data, and the calculation rule of the label is clarified. Therefore, when the service is provided for a specific user, the application of data filtering, data classification, data clustering and the like is supported from a label layer by combining the attributes of the user such as mechanism, post and the like, so that a more convenient, quicker and more efficient searching effect is provided for the user.

Specifically, the server may classify each raw data by using a text classification model to obtain a corresponding classification result.

In this embodiment, for each different original data, the server may improve the generalization level of information analysis by specifying a dedicated analysis adapter, and may continuously expand as needed in the actual operation process to expand the function of the analysis adapter. The analytical adaptations for different types of raw data are shown in table two below.

Watch two

In the embodiment, in the original data analysis process, the server can respectively construct indexes, vectors and relationship information for each original data, provide data with different dimensions for intelligent search and intelligent recommendation, and lay a data foundation for guaranteeing the accuracy of the intelligent search and recommendation and the richness of contents. That is, the server may perform data preprocessing on each classified raw data to generate at least one of corresponding index data, vector data, and relationship data.

Further, the server may construct a feature library corresponding to the original data according to at least one of the index data, the vector data, and the relationship data, such as constructing an index library according to the index data, constructing a vector library according to the vector data, constructing a relationship chart library according to the relationship data, and the like.

In this embodiment, the content is stored in units of specifically penetrated content, that is, under the same system, some of the content may only have indexes, some of the content may only have vectors, and some of the content may only have relationships, and some of the content may be stored in multiple types, that is, the index data, the vector data, and the relationship data may be generated simultaneously based on the original data. Taking the web page data as an example, the server may generate corresponding index data, vector data, and relationship data based on the original web page data, and respectively store the index data in the index library, store the vector data in the vector library, and store the relationship data in the relationship graph library.

In one embodiment, the data preprocessing is performed on each classified raw data to obtain at least one of corresponding index data, vector data, and relationship data, and the method may include: performing word segmentation processing on each original data, and obtaining corresponding index features based on word segmentation results; determining a classification label of each original data based on a classification result of each original data; determining the similarity between the original data according to the classification labels; obtaining the vector characteristics of each original data based on each original data; determining an incidence relation between the original data according to the similarity and the anisotropic characteristic between the original data; and obtaining at least one item of corresponding index data, vector data and relation data based on the index features, the vector features and the incidence relation.

In the embodiment, three types of intelligent searching capabilities of keyword level, semantic level and relation level are provided for the search-oriented object, in the information analysis process, namely the data preprocessing process, the server can respectively construct index, vector and relation information for each original data according to the content, characteristics of different dimensions are provided for intelligent search and intelligent recommendation, and a data basis is laid for guaranteeing the accuracy of the intelligent search and recommendation and the richness of the content.

In this embodiment, when the server classifies the original data through the text classification model, the server may generate a corresponding classification tag. The category labels may be determined based on specific business scenarios and may include, for example, party activities, technical exchanges, strategic collaborations, corporate colleagues, and the like.

Further, the server may convert the classification tag of the original data into a one-hot format, e.g., 1 for an existing classification value and 0 for a non-existing classification value.

Further, the server may calculate the similarity of the classification labels of different raw data using the jaccard algorithm.

In this embodiment, after obtaining the similarity of the classification label of each piece of original data, the server may retain n pieces of original data with the highest similarity, such as n articles, to obtain an original data similarity list 1, that is, to obtain the association relationship of each piece of original data in the classification. Where n is an empirical parameter set according to actual use.

Further, the server may generate a topic vector and a semantic vector corresponding to each raw data, respectively, by using the vector generation model.

Specifically, the server may generate topic vectors of each original data by using a topic model, such as a topic (LDA) model, and obtain topic vector distribution of each original data based on each topic vector.

In this embodiment, the server may calculate the distribution of the topic vectors of different raw data by using the cosine distance.

Further, the server may obtain the topic vector distribution according to calculation, and retain n pieces of original data with the highest similarity, such as n articles, of the original data to obtain an original data similarity list 2, that is, obtain the association relationship of the original data on the topic.

In this embodiment, after obtaining the original data similarity list 1 and the original data similarity list 2, the server may store the original data similarity list 1 and the original data similarity list 2 in a database, where key1 is an original data id, value1 is the original data similarity list 1, key2 is an original data id, and value2 is the original data similarity list 2.

In this embodiment, the server may perform vector embedding on the original content using the domain word embedding model, generate a title vector and a segment level vector, respectively, and store the vectors in the vector database. The term "domain" mainly refers to the professional vocabulary in the insurance domain and the professional vocabulary in the enterprise, such as organization name, system name, product name, etc.

In this embodiment, the server may use the domain segmentation model to segment the data content of the original data to obtain the segmentation of the data content, filter the vocabulary with lower weight according to the weight of the vocabulary, and generate the inverted index for the remaining vocabulary.

Further, the server may store the inverted indexed vocabulary in a database, such as a non-relational database (ES).

In this embodiment, for the es database, the server may generate one index (index) according to the public authority for public authority retrieval, and one index for each of the remaining professional companies for complex authority retrieval.

In this embodiment, the domain word embedding model is generated after the pre-training of the arbitrary word embedding model, and here, the training process of the model is described in detail by taking a BERT (Binary ERlang Term) model as an example.

In this embodiment, the server may obtain training data, such as text data accumulated by a company, and MASK part of words in the text data, where an example "[ CLS ] china tai [ MASK ] is originated from three national brands of insurance 1, insurance 2, and insurance 3, the national business is stopped by the insurance 1 and the insurance 2, and the private business [ MASK ] is macadamia and overseas insurance business is stopped in 1956 according to the national unification [ MASK ]. [ SEP ] all [ MASK ] in 1999, foreign insurance machines [ MASK ] are managed by the GmbH of China. [ SEP ], masked text data is obtained.

Further, the server may convert the masked text data into numbers according to a preset dictionary, obtain corresponding vectors in a vector matrix (which may be initialized randomly or use an existing vector matrix) according to the numbers, and input the words into the embedding model.

In this embodiment, the server may input the masked text data of batch _ size × 2, where batch _ size (batch) is an arbitrary parameter set according to the machine performance, where batch _ size =16 and batch _ size × 2=32 are set.

In this embodiment, the server predicts a vector for each [ MASK ] using a word embedding model, while generating a sentence vector [ cls ].

In this embodiment, the server may calculate an inner product of the sentence vectors of the first 16 texts and the last 16 texts, obtain a similarity matrix of 16 × 16, and average the vectors of the two predictions [ MASK ].

In this embodiment, for the vector of [ MASK ], the server may use softmax (logistic regression) to get the prediction loss. For the similarity matrix of sentence vectors, the server may perform softmax by rows, resulting in a similarity loss. Wherein model pre-training loss = character prediction loss + similarity loss.

Further, the server may iteratively train the word embedding model according to the pre-training loss obtained by the calculation.

In this embodiment, when the server generates the index data, the vector data, and the relationship data, the server may further generate an object tag corresponding to an object based on object data of the object acquired in the original system.

Specifically, the server may generate a static tag of the corresponding object using a static attribute of the object, such as gender, age, position, and the like, and generate a dynamic tag of the corresponding object using interactive behavior data of the object, such as click history, last browsing content, last browsing time, and the like.

Further, the server may save the generated static and dynamic tags of the object to a database, such as a feature library.

In this embodiment, for a static tag, such as age, the server may generate 100 discrete values according to 0-100, generate an age group of [ t-1, t, t + 1] for each object, and t is the real age of the object. For a station, department, etc., the server may convert to a plurality of discrete values according to the specific category information. For other information items, such as positions, etc., the server may convert to a plurality of discrete values according to the specific category information.

In this embodiment, the server may perform vector embedding according to the title of the clicked content of the user, and convert the text into a vector using a vector embedding model as mentioned above. Here, the server may convert the title of the content of the last 30 clicks of the object into a vector, and then take the average of 30 vectors as the object vector to embed.

In this embodiment, the server may also generate a short-term interest tag and a long-term interest tag of the object. The short-term interest tags obtain a classification tag list according to the content read in the last 30 times or 7 days of the object history, and keep n classification tags as the short-term interest tags of the object after the classification tags are changed from high to low according to the frequency. For the long-term interest tags, the server can obtain a classification tag list according to the content read in the last 300 times or 180 days of the object history, and keep n classification tags as the long-term interest tags of the object after the classification tags are changed from high to low according to the frequency.

Further, the server can also generate a title vector of the last viewed content of the object according to the last viewed content of the object. And dividing the object into 0, 1 discrete values according to whether the object is more than thirty minutes or not according to the time from the end of the last browsing of the content to the present.

In this embodiment, the server may generate static relationship data based on the organization of the object and form dynamic relationship data between the objects based on the intersection of the access contents of the users, such as the upper level, lower level, and horizontal relationship of the employees, the affiliation relationship of the users and the organizations, and the like.

Further, after the server obtains at least one of corresponding index data, vector data and relationship data based on each classification label, vector feature and association relationship, the server may also form related relationship data such as an attribution system, an attribution catalog classification and the like between the original data according to a hierarchical catalog, an attribution classification and the like provided by the original system.

In one embodiment, constructing a feature library corresponding to each raw data based on each raw data may include: constructing a plurality of initial feature libraries; and performing library storage on each original data according to the system type of the original system corresponding to each original data and the data type of each original data to obtain a plurality of feature libraries corresponding to the original data.

In this embodiment, for different types of original systems and different data types, the server may fully consider the sub-library and sub-table design, for example, establish a title-level vector library, a paragraph-level vector library, a fragment-level vector library, and the like for a web page, and lay a data architecture foundation for subsequent performance optimization based on the sub-library and sub-table correlation.

In this embodiment, according to the system type of each original system and the data type of each original data, the server may construct a corresponding initial feature library, and store the corresponding original data in the corresponding initial feature library to obtain a plurality of feature libraries.

In one embodiment, the filtering the search area according to the search data and the object data, and determining the target search area for performing the data search may include: determining a search right for searching the object based on the object data, wherein the search right comprises organization structure authorization and/or user group authorization; and according to the search authority and the search data, filtering the search area and determining a target search area for data search.

In this embodiment, in order to guarantee efficiency in the search and recommendation processes and guarantee safe access to data, the server may authorize each search object according to an organization authority authorization or according to a user group authorization manner, so as to implement authority management and control.

Specifically, the server may specify the organization or user list for which information is authorized when establishing the search area, i.e., when building the feature library. When the subsequent search object carries out data search, the authorized search is carried out based on the organization structure of the search object or the affiliated user group.

In this embodiment, when performing data search filtering, the server may determine, according to the object data, a search right of a search object, such as organization authorization or authorization according to a user group, and perform filtering of the search area, such as filtering out a search area that is not authorized to access, and only reserving an accessible search area to obtain a target search area.

Specifically, the server may collide with the organization or user ID recorded in the feature library according to the organization or user ID to which the current search object belongs, filter and screen data to form an information list, determine a target search area that is searchable within a user authority range, and after searching based on search data, display the target search area to the search object for viewing to ensure safety and controllability of information access.

In this embodiment, the filtering of the search area based on the search data and the filtering of the object authentication may be performed synchronously, that is, the search data and the object rights are subjected to double combined filtering to obtain the object rights satisfying the search object and the target search area satisfying the search data, that is, the target database.

In one embodiment, the server ascertains that the current querying user has explicit organizational information, such as 1026, the current user's corresponding organizational structure, i.e., the server can query for corresponding content based on the user organizational structure information.

In one embodiment, the filtering the search area according to the search data and the object data to determine the target search area may include: preprocessing the search data to obtain preprocessed search data; and filtering the search area according to the object data and the preprocessed search data, and determining a target search area.

In this embodiment, referring to fig. 7, when the server performs data search, a plurality of processing flows may be included, that is, input of search data, preprocessing of search data, recall of searched data, sorting of searched data, and the like. Optionally, reordering, presentation, etc. may also be included.

Specifically, after the server acquires the search data, that is, after receiving the information input by the search object, the server can judge and identify the search intention, then filter the search area based on the judged and current search data, and present the content matched with the search area to the search object in a list manner from the filtered target search area, so as to improve the user experience and accuracy of the intelligent search.

In one embodiment, the preprocessing the search data to obtain the preprocessed search data may include: carrying out standardized conversion processing on the search data to obtain standardized search data; performing intention identification conversion processing on the search data to generate corresponding intention identification information; and obtaining the preprocessed search data according to the standardized search data and the intention identification information.

Specifically, referring to fig. 8, the scheme supports multi-modal input search, and can search through three modes, namely text, voice and picture, that is, a search object can input characters, and can also search through speaking or scanning a specific picture.

In this embodiment, for voice input, the server may provide a voice monitoring function at a terminal (e.g., a mobile terminal or a PC terminal), a search object may be woken up by a voice interaction function through a wake-up word, e.g., "science and technology", and extract voice searched by the search object, and convert the voice into a text through an ASR (Automatic Speech Recognition) interface of a voice service provider, e.g., "science and technology" please help me search a full map ", and intercept a key information" full map "searched by the search object in combination with a preset name expression. It will be appreciated by persons skilled in the art that the above is by way of example only.

Similarly, for the picture, the server may scan the picture inside or outside the terminal through an OCR (Optical Character Recognition) tool, and automatically extract the text in the picture, including the common picture and common certificates, such as an id card, a birth certificate, a foreign passport, and an account book.

Further, for text, the server may directly acquire the input text information.

In this embodiment, the server may perform a standardized conversion process of the search data after obtaining the search data in a text form by various methods such as voice recognition and image recognition.

Specifically, the server can perform the standardized conversion processing of the text search data by combining the preprocessing method in the general field and the preprocessing method in the professional field.

In this embodiment, the prediction processing method in the general field mainly includes case conversion, full half-angle conversion, complex and simplified conversion, length truncation, and the like, and the specific preprocessing flow is as follows:

firstly, the server can perform word segmentation processing, namely, the input text search data is segmented by using a word segmentation model and combining a domain word stock to obtain words with fine granularity, for example, after word segmentation is performed on 'a certain life insurance', a certain 'life', 'insurance', 'good' and 'do' are obtained.

Then, the server may perform full-half-angle conversion on the data after word segmentation, and convert the query input in the full-angle mode of the input method into the half-angle mode, which mainly has an influence on english, numbers, and punctuation marks, for example, converting the full-angle input of "wechat 123" into "wechat 123" in the half-angle mode.

Furthermore, the server can perform case conversion and uniformly convert the capital letters into the lower case letters.

Further, the server may perform complex form conversion to convert complex form input into a simple form, and in some cases, complex form input before conversion needs to be retained for recall in consideration of differences of user groups and resources in complex form.

Further, the server may also remove meaningless symbols, remove special symbol content such as mars symbols, emoji emoticons, and the like.

Further, the server may perform Query truncation, and perform truncation processing on a Query exceeding a certain length.

Those skilled in the art can understand that the above processing procedures are not in a sequential order, and the server may select part or all of the preprocessing in the general field based on the triggering of the actual application requirement, and perform the combined processing according to a certain order.

In this embodiment, the server may also perform professional-field preprocessing on the text search data.

Specifically, the server may construct an industry-specific thesaurus, which may include the construction of stock thesaurus and the discovery of new words. On the basis of the construction of a word bank in the professional field, the standardization of the professional field is carried out on the input information, for example, the word bank in the professional field is applied to the inquiry and segmentation, and meanwhile, the functions of vocabulary association and the like in the professional field are provided.

In this embodiment, for the construction of the domain lexicon, the server may use the new word discovery algorithm for the domain corpus to obtain the domain dictionary 1, that is, the server may use the new word discovery algorithm for the general purpose predictions to obtain the general purpose dictionary.

Specifically, the new word discovery algorithm may perform multiple processes such as statistics, segmentation, backtracking, and the like.

For the statistical process, the server counts 2-grams, calculates their internal solidity, and retains only segments above a certain threshold, forming a set G. In practical applications, the server may set different thresholds for 2-grams, 3-grams, …, and n-grams, and the larger the number of words, the more insufficient the statistics and the higher the probability, and the setting may be performed in the middle of practical applications.

Further, the server can segment the speech (coarse segmentation) by the above grams and count the frequency. The rule of the segmentation is that as long as a segment appears in the set G obtained in the previous step, the segment is not segmented, for example, "items", and as long as "items" and "items" are both in G, even if "items" are not in G, then "items" are not segmented, and remain.

Further, the server performs a backtracking check after completing the segmentation. Specifically, if the corpus is a word less than or equal to n, it is detected that it is not in G and is removed if it is not. If the corpus is a word larger than n, detecting whether each n-word segment is in G, and removing the n-word segment as long as one segment is not in G. Or taking each item as an example, the backtracking is to see that each item is not in the 3-gram, and if the item is not in the 3-gram, the item is removed.

In this embodiment, the server may change the 2-grams into 3-grams, iterate the above statistics, segmentation, and backtracking processes, and terminate the processes when iterating to 5-grams, thereby obtaining the dictionary.

In this embodiment, the server may count the occurrence probability of the word in the domain corpus in the domain dictionary 1 and count the occurrence probability of the word in the general corpus in the general dictionary 1.

Further, the server may place words having a word probability difference greater than a threshold value into the domain dictionary 2 using the probability of a word in the domain dictionary 1 minus the corresponding word in the general dictionary 1 (the word occurrence probability is 0 if it does not occur in the general dictionary). The domain dictionary 2 is the domain dictionary obtained finally.

In this embodiment, the server may also perform intention recognition conversion processing on the search data to generate corresponding intention recognition information.

Specifically, the server may refine the search sentence pattern of the daily content based on the existing information of the content platform, in combination with the information of the organization, the post, and the like to which the user belongs, thereby forming a logical tree for search intention recognition. In practical application, the server can continuously iterate and update the logic data, so that the logic tree is richer, and the accuracy of intention identification is further improved.

In this embodiment, the server may generate corresponding intention identification information based on the logic tree, for example, the intention identification information corresponding to "news" is "newly released news", "what important event is in these days" is "newly released news", or the intention identification information corresponding to "what important news is in company recently" is "newly released news". For different interrogation modes, the server can obtain the same intention identification information based on the logic tree.

In this embodiment, the server may use the normalized search data and the intention identification information as the search data after the preprocessing and for performing the data search of the target search area.

In one embodiment, the filtering the search area according to the search data and the object data to determine the target search area may include: and according to the search data and the object data, filtering the target library and/or the target table and/or the target field, and determining a corresponding target search area.

In this embodiment, since the amount of information stored in the server data is huge, in order to improve the efficiency and accuracy of the search, when the server performs the filtering, the server may determine the corresponding target search area by filtering the target library and/or the target table and/or the target field with reference to fig. 9, that is, the server may implement library-level filtering, table-level filtering, and field-level filtering.

In this embodiment, for the library-level filtering, the server may determine the libraries that can be searched according to the organizational structure box to which the search object belongs, and narrow the search range from the library level, as in a group company, a claim checker of a professional company a may only access news and knowledge libraries related to the professional company a, but not product libraries of the professional company B, and the like.

Further, the server may perform table-level filtering on the determined target library which can be searched, and selectively consider whether to screen only the table related to the lower part of the library according to the judgment of the search intention of the search object, taking the intranet portal as an example, the contents of the table can be established according to the following classification, such as news, IT, personnel, training, finance, procurement, workshop, secretarial affairs, administration, strategy, party establishment, and the like. Specifically, if the current search object inputs 'party building', only the party building table and the news table need to be searched in a key mode, namely, only 2 tables need to be searched instead of 12 tables needing to be visited originally, so that the data volume needing to be searched can be further reduced, and the search efficiency is improved.

In this embodiment, the server may perform field-level filtering on the determined target table. Specifically, based on basic information such as the mechanism or the post to which the search object belongs, collision filtering is performed with an authorization mechanism or a user group stored on the target table, so that primary authentication and field-level filtering are realized, and a final target search area is obtained.

In one embodiment, performing data search on the target search area based on the search data to obtain corresponding target index information may include: and performing keyword search and/or semantic search and/or association relation search on the target search area based on the search data to obtain corresponding target index information.

In this embodiment, the server may provide a plurality of different types of search support, such as three types of search support, that is, keyword search, semantic search, and association search.

Specifically, for keyword search, after the server preprocesses the acquired search data, one or more search relation words can be identified, and the search of the feature library can be performed through the keywords.

In this embodiment, for semantic search, the server may search the vector library according to the vectors of the search data.

In this embodiment, the vector library may include a chapter-level vector library and a segment-level or paragraph-level vector library, and the server may preferentially search in the chapter-level vector library, and if the search is not found, search is performed from the segment-level or paragraph-level vector library.

In the present embodiment, for the association relation search, the server may perform based on the graph database.

In this embodiment, the graph database may include relationships between persons and original data in addition to the relationships between original data and original data described above. The server can search based on the relationship between people, the relationship between people and original data, and the relationship between original data to obtain corresponding target index information.

In one embodiment, performing data search on the target search area based on the search data to obtain corresponding target index information may include: respectively carrying out data search on target search areas of at least one of an index library, a vector library and a relational library based on search data to obtain corresponding index sub-information; and merging the index sub-information, and performing duplicate removal processing to obtain corresponding target index information.

In the present embodiment, as described above, the search area constructed by the server may include an index library, a vector library, and a graph library. When the server searches data, the server can search at least one of the index library, the vector library and the relation library respectively to obtain corresponding index sub-information. For a specific search process, reference may be made to the foregoing description, which is not repeated herein.

In this embodiment, after the server obtains the index sub-information of the corresponding index, the vector library and the relation library, the server may generate a result list, and perform merging and deduplication processing after analysis and judgment to obtain the corresponding target index information.

In one embodiment, the target data is a plurality of target data, and after the server determines the corresponding target data based on the target index information, the method may further include: determining a target weight corresponding to each target data according to a plurality of target data and/or object data; ranking the plurality of target data based on the target weights; and pushing the sorted target data to a search object.

In this embodiment, the target data determined by the server based on the target index information may be multiple, that is, multiple target data may be searched. The server can intelligently sort the obtained target data.

Specifically, the server may provide intelligent result ranking based on the relevance of the searched target data to the work of the search object, the issue time of the target data, and other factors. For example, the server may calculate a target weight for each target data based on the target data and the target data of the search target. Alternatively, the server may perform weight determination based on only the target data.

Specifically, since the searched target data will be derived from the index library, the vector library and the graph database, in order to ensure the accuracy and richness of the search, the server may set different weights among different data sources, for example, the server may set initial weights of 40%, 40% and 20% for the index library, the vector library and the graph database, and then the server may automatically adjust the corresponding weights according to the click condition of the searched target data by the search object. In addition, the server can set corresponding weights for target data of different source types, and self-learning and adjustment can be performed according to the clicking condition of the user, so that the sequencing accuracy is improved.

In this embodiment, the server may push the sorted target data to the user side in the form of a data form, that is, search the target, so that the search target is further processed. For example, a certain target data is selected, and detailed data content is viewed.

In one embodiment, the method may further include: acquiring target object data of a target object to be pushed; determining the object type of each target object to be pushed based on each target object data; and acquiring corresponding push data according to each object type, and pushing the push data to the target object.

In this embodiment, the server may also provide an intelligent recommendation service for each known object.

Specifically, the server may determine the object type of the target object to be pushed, such as an internal affair, an external affair and a client of an enterprise, according to each target object data, and a difference is required in the intelligently recommended strategy for different target objects, that is, the server may obtain corresponding push data and push the push data.

In the present embodiment, for the internal work, the server push is focused on managing office-related information, enabling daily jobs, and the like. And for the field work, the product and the service are focused on timely announcement and push, and the sales exhibition industry is enabled. For the client, the server can focus on pushing of service information in the early stage, user experience is improved, and product information pushing and the like are gradually overlapped after certain accumulation.

In one embodiment, the method may further include: acquiring push data to be pushed and target object data of a target object to be pushed, wherein the target object data comprises personal characteristic data and historical access data; based on the target object data, carrying out group division on the target objects to be pushed to generate each object group; establishing an association relation between the push data to be pushed according to the push data to be pushed; and pushing the pushed data to be pushed according to any one of the personal characteristic data, the historical access data, the object group and the incidence relation.

In this embodiment, the server may perform customized optimization on the feature association and collaborative filtering related mainstream algorithm by combining the push data and the target object data of the target object to be pushed, and provide various push modes such as recommendation for personal basic information, recommendation for personal behavior information, collaborative recommendation for users, and collaborative recommendation for content.

In this embodiment, the server provides four types of recommendation strategies, which may specifically be shown in table three below.

Watch III

In this embodiment, the recommendation of the recommended content by the server is essentially a function that fits the satisfaction of the user on the push data (such as articles, pictures, audios and videos), and the function requires three dimensional variables of input content, environment, and user, and can be summarized into 4 features: the content relevance, the content popularity, the environmental characteristics, and the user's collaborative characteristics are shown in table four below.

Watch four

Wherein the relevance feature is used for evaluating the attribute of the pushed data and whether the pushed data is matched with the target object. Explicit matches include keyword matches, category matches, source matches, topic matches, and the like.

The environmental characteristics may include geographic location, time, weather, network environment.

The content hot degree is very effective in information recommendation when the system is in cold start (namely, the content without the historical operation behavior of the user).

The collaborative feature is to analyze the similarity between different objects through object behaviors, such as click similarity, interest classification similarity, topic similarity, interest word similarity, and even vector similarity to make related content recommendation.

In the present embodiment, for each piece of push data, in addition to the classification of the data content, the quality, the degree of hotness, the degree of originality, the recommendation weight of the distributed object, and the like of the data content are also features.

In this embodiment, the server may push based on an initial recommendation mechanism.

Specifically, when recommendation is performed based on the relevance features, keywords, categories, topics and sources can be automatically matched according to the features of articles read by the user history, and similar articles are recommended.

In this embodiment, when recommending based on environmental characteristics, for a region, content having the same geographic information, for example, the same city, is locally prioritized; for the contents with strong timeliness and time, the contents are preferably pushed on special holidays (such as Party and century, Olympic Association of Tokyo); for weather with the same weather characteristics, such as the river south flood, disaster notification, rescue progress and early warning information content are preferentially pushed; for a network environment, the weight of recommended content is flexibly adjusted according to the strength of network signals of a background monitoring user, such as 2G/3G/4G/5G and the like, for example, 40% of videos, 30% of graphics and texts, 20% of pictures and 10% of texts in the 5G network environment.

Further, the server can calculate and push the heat characteristics by calculating the heat and the weight determined by the corresponding weight rule.

Specifically, for the heat calculation, the global heat and the keyword may be included. The global popularity is based on the intelligent distribution and recommendation of content data by the server, and is subjected to weighted summation and indexing processing by combining the number of behaviors of reading, commenting, forwarding, liking, disliking, collecting and the like of the object, the content publishing time and other factors to obtain a global popularity score. The determination of keywords, categories and subject words is the same.

Further, the weighting rules include a heat weighting and a heat de-weighting. Specifically, the weight is given when the transmitted content contains multiple pictures, and the weight is given when the transmitted content contains double # topics or keywords. For the heat descending weight, the outer chain descending weight is included; in the list in the same time period, the same user has at most 2 pieces of content to be listed, and the 2 nd piece of content is reduced in popularity; a second right reduction of the similar content; the sub-class account of the non-original section reduces the right; content that the user has already seen, etc.

In this embodiment, the collaborative features may include click-like features and interest-like features.

Specifically, with respect to the click similarity feature, description is made with reference to fig. 10. When the user A and the user B have similar clicking operation behaviors for the articles, such as viewing, forwarding, collecting and the like, recommendation can be performed. For example, when user a watches contents 1-4 and user B only watches contents 2 and 3, the server may recommend content 1 and content 4 watched by a to user B.

Similarly, the server may also recommend based on the content, which is described by taking fig. 11 as an example. If both user a and user B see content 2 and content 3, and the keywords, categories, sources, etc. of content 1 and content 4 are similar to content 2 and content 3, the server may recommend content 2 and content 3 to user a and user B.

Further, for the interest similarity feature, the server may push the interest categories, topics, words, etc. of the user based on the same rules as above.

In one embodiment, the method may further include: acquiring feedback data of the target object to the push data, wherein the feedback data comprises a click rate and a reading completion rate; and determining secondary push data corresponding to the target object based on the click rate, the reading completion rate and the push information of the push data, and pushing the secondary push data to the target object.

In this embodiment, after attempting to recommend a certain amount of push data, the server may determine whether to perform a new round of larger-range recommendation again or reduce the recommendation amount, i.e., perform a second push, according to the click rate and the reading completion rate of the recommended target user.

In this embodiment, the server may determine to perform secondary pushing by calculating the recommendation index according to the click rate, the reading completion rate, and the pushing information of the pushed data, such as the text sending time and the popularity. The specific calculation formula can be as follows:

recommendation index =35% click volume +20% read completion +25% text time +20% heat

The text sending time may be 72 hours or 48 hours, etc., which is not limited in this application. The popularity can be comprehensively determined based on click rate, reading completion rate, praise, forwarding, collection, comment interaction and the like

In one embodiment, the server may also make a recommendation time determination to facilitate pushing in a reasonable time.

Specifically, the server may determine the recommendation time by considering 3 factors, such as recommendation time, recommendation amount, and distribution time, from the point of interest that allows more objects to be obtained from the high-quality content.

In this embodiment, the longer the message sending time, the lower the recommended amount will be, so the server can determine the high recommended amount time after sending the message, and the longer this time period is, the better.

In this embodiment, the server may set three time periods with the highest recommended amount, and after trying for a period of time, the server may perform statistical analysis for the actual situation to perform adjustment. 3 hours in the morning, such as 07: 00-10: 00; 2 hours at noon of 11: 00-13: 00; 3 hours at night for 22: 00-01: 00, and the like.

In one embodiment, the server may also perform a determination of the recommended location. For example, the server may perform a recommended presentation of personalized content on the home page according to the object characteristics of the target object. Meanwhile, the server can update and recommend the data in real time. Further, the server may recommend according to the list, for example, generate a hot list according to different classifications of the heat degree, and recommend the hot list to the user.

In one embodiment, after determining the corresponding target data based on the target index information, any one of the following items may be further included: constructing a display page corresponding to the target data based on a display request of the search object for the target data, and displaying through a terminal interface; and calling an original page of the original system based on the display request of the search object to the target data, and displaying the original page through a terminal interface.

In this embodiment, as described above, after determining the corresponding target data based on the target index information, the server may present the target data to the searching user in a list.

Further, the server may request the display of specific data content by clicking on the displayed target data.

In this embodiment, the server may display the specific data content in a plurality of ways, for example, the server may generate a display page, such as a display card, and send the display page to the terminal interface for displaying. Or the server can directly call the original page of the original system and display the original page through a terminal interface.

In this embodiment, the server may perform intelligent card display according to the emotion, environment, and other factors expressed by the search data of the search object.

Specifically, referring to fig. 12, the server may embed points by the user, construct a user portrait, integrate environmental features, user features, and content features, map the features to a vector space by an embedding method, then merge the features by adopting a merging or bitwise element addition method to finally obtain a user vector, and output a dynamic card rich in emotional elements and a personalized static card by combining the user vector when the card is displayed for the terminal user, thereby realizing personalized display.

In one embodiment, invoking an original page of an original system based on a display request of a search object for target data, and displaying the original page through a terminal interface may include: calling a system interface of an original system corresponding to the target data based on a display request of the search object for the target data; carrying out authority verification on the searched object through a system interface; and after the verification is determined to be passed, calling an original page of the original system, and displaying the original page through a terminal interface.

In this embodiment, after the search object clicks the target data, the server may obtain a token for uniform identity authentication as a parameter, directly call a display interface of the original system corresponding to the target data, perform token authentication by the original system, and display the page after the token authentication is passed. For example, news on the intranet portal directly calls the news display page, forms on the OA, the form display page and the like.

Specifically, when the authority of the original system is verified and displayed, the server can call a system interface of the original system, search object authority verification is carried out through the system interface, the authority control standard of the original system is ensured to be consistent, then an original page of the original system is called, and the original page is displayed through a terminal interface, so that secondary authentication is achieved.

It should be understood that, although the steps in the flowchart of fig. 2 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 2 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

In one embodiment, as shown in fig. 13, there is provided a cross-system data acquisition apparatus including: a search request receiving module 100, a target search area determining module 200, a target index information determining module 300, and a target data determining module 400, wherein:

the search request receiving module 100 is configured to receive a search request, where the search request carries search data corresponding to target data to be searched and object data of a search object.

And a target search area determining module 200, configured to filter the search area according to the search data and the object data, and determine a target search area for performing data search, where the search area is constructed based on original data of multiple original systems.

And a target index information determining module 300, configured to perform data search on the target search area based on the search data to obtain corresponding target index information.

And a target data determining module 400, configured to determine corresponding target data based on the target index information.

In one embodiment, the apparatus may further include:

the original data acquisition module is used for acquiring each original data of each original system;

the characteristic library construction module is used for constructing a characteristic library corresponding to the original data based on each original data, and the characteristic library comprises at least one of an index library, a vector library and a relation library;

in this embodiment, the target search area determining module 200 is configured to filter the search area of the feature library according to the search data and the object data, and determine the target search area.

In one embodiment, the raw data obtaining module may include:

and the acquisition mode determining submodule is used for determining a data acquisition mode corresponding to each original system based on the system parameters of each original system, wherein the data acquisition mode comprises at least one of data capture, interface acquisition and function registration acquisition.

And the original data acquisition submodule is used for acquiring corresponding original data from each original system according to each determined data acquisition mode.

In one embodiment, the feature library building module may include:

the classification submodule is used for carrying out data transverse classification and/or longitudinal classification on each original data to obtain each classified original data; the horizontal classification is classified according to the hierarchical label of the original data, and the vertical classification is classified according to the data characteristics of the original data; .

And the generation submodule is used for carrying out data preprocessing on each classified original data to generate at least one of corresponding index data, vector data and relation data.

And the construction submodule is used for constructing a feature library corresponding to the original data based on at least one item of index data, vector data and relation data.

In one embodiment, the generating of the sub-module may include:

and the word segmentation unit is used for performing word segmentation processing on each original data and obtaining corresponding index features based on word segmentation results.

And the label determining unit is used for determining the classification label of each original data based on the classification result of each original data.

And the similarity determining unit is used for determining the similarity between the original data according to the classification labels.

And the vector characteristic determining unit is used for obtaining the vector characteristics of each original data based on each original data.

And the incidence relation determining unit is used for determining the incidence relation between the original data according to the similarity and the anisotropic characteristic between the original data.

And the generating unit is used for obtaining at least one item of corresponding index data, vector data and relation data based on the index features, the vector features and the incidence relation.

In one embodiment, the feature library building module may include:

and the initial feature library constructing submodule is used for constructing a plurality of initial feature libraries.

And the database-dividing storage submodule is used for performing database-dividing storage on each original data according to the system type of the original system corresponding to each original data and the data type of each original data to obtain a plurality of feature databases corresponding to the original data.

In one embodiment, the target search area determination module 200 may include:

and the search permission determining sub-module is used for determining the search permission of the search object based on the object data, and the search permission comprises organization structure authorization and/or user group authorization.

And the first target search area determining submodule is used for filtering the search area according to the search authority and the search data and determining the target search area for searching the data.

In one embodiment, the target search area determination module 200 may include:

and the preprocessing submodule is used for preprocessing the search data to obtain the preprocessed search data.

And the second target search area determining submodule is used for filtering the search area according to the object data and the preprocessed search data and determining the target search area.

In one embodiment, the preprocessing submodule may include:

and the standardized conversion processing unit is used for carrying out standardized conversion processing on the search data to obtain standardized search data.

And the intention identification conversion processing unit is used for carrying out intention identification conversion processing on the search data and generating corresponding intention identification information.

And the generation subunit is used for obtaining the preprocessed search data according to the standardized search data and the intention identification information.

In one embodiment, the target search area determination module 200 is configured to perform filtering on a target library and/or a target table and/or a target field according to the search data and the object data, and determine a corresponding target search area.

In one embodiment, the target index information determining module 300 is configured to perform keyword search and/or semantic search and/or association search on the target search area based on the search data to obtain corresponding target index information.

In one embodiment, the target index information determining module 300 may include:

and the primer information acquisition submodule is used for respectively carrying out data search on a target search area of at least one of the index library, the vector library and the relational library based on the search data to acquire corresponding index information.

And the target index information determining submodule is used for merging the index sub-information and carrying out duplicate removal processing to obtain corresponding target index information.

In one embodiment, the apparatus may further include:

and the target object data acquisition module is used for acquiring target object data of the target object to be pushed.

And the object type determining module is used for determining the object type of each target object to be pushed based on each target object data.

And the first pushing module is used for acquiring corresponding pushing data according to each object type and pushing the pushing data to the target object.

In one embodiment, the apparatus may further include:

and the pushed data target object data acquisition submodule is used for acquiring pushed data to be pushed and target object data of a target object to be pushed, and the target object data comprises personal characteristic data and historical access data.

And the group division module is used for carrying out group division on the target objects to be pushed based on the target object data to generate each object group.

And the incidence relation establishing module is used for establishing incidence relation between the push data to be pushed according to the push data to be pushed.

And the second pushing module is used for pushing the pushing data to be pushed according to any one of the personal characteristic data, the historical access data, the object group and the association relation.

In one embodiment, the apparatus may further include:

and the feedback data acquisition module is used for acquiring feedback data of the target object to the push data, wherein the feedback data comprises a click rate and a reading completion rate.

And the secondary pushing module is used for determining secondary pushing data corresponding to the target object based on the click rate, the reading completion rate and pushing information of the pushing data, and pushing the secondary pushing data to the target object.

In one embodiment, the target data is a plurality of target data.

In this embodiment, the apparatus may further include:

and the weight determining module is used for determining the target weight corresponding to each target data according to a plurality of target data and/or object data after determining the corresponding target data based on the target index information.

And the sequencing module is used for sequencing the target data based on the target weight.

And the pushing module is used for pushing the sorted target data to the search object.

In one embodiment, the apparatus may further include any one of the following modules:

and the first display module is used for establishing a display page corresponding to the target data based on a display request of the search object to the target data after determining the corresponding target data based on the target index information, and displaying the display page through a terminal interface.

And the second display module is used for calling an original page of the original system based on a display request of the search object for the target data after determining the corresponding target data based on the target index information, and displaying the original page through a terminal interface.

In one embodiment, the second display module may include:

and the calling submodule is used for calling a system interface of the original system corresponding to the target data based on the display request of the search object to the target data.

The authority verification submodule is used for performing authority verification on the search object through a system interface;

and the display sub-module is used for calling the original page of the original system after the verification is determined to be passed, and displaying the original page through a terminal interface.

For specific limitations of the cross-system data acquisition device, reference may be made to the above limitations of the cross-system data acquisition method, which are not described herein again. The modules in the cross-system data acquisition device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 14. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer device is used for storing search data, object data, original data, target index information, target data and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a cross-system data acquisition method.

Those skilled in the art will appreciate that the architecture shown in fig. 14 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, there is provided a computer device comprising a memory storing a computer program and a processor implementing the following steps when the processor executes the computer program: receiving a search request, wherein the search request carries search data corresponding to target data to be searched and object data of a search object; according to the search data and the object data, filtering the search area, and determining a target search area for data search, wherein the search area is constructed based on the original data of a plurality of original systems; based on the search data, performing data search on the target search area to obtain corresponding target index information; based on the target index information, corresponding target data is determined.

In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of: receiving a search request, wherein the search request carries search data corresponding to target data to be searched and object data of a search object; according to the search data and the object data, filtering the search area, and determining a target search area for data search, wherein the search area is constructed based on the original data of a plurality of original systems; based on the search data, performing data search on the target search area to obtain corresponding target index information; based on the target index information, corresponding target data is determined.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A cross-system data acquisition method, the method comprising:

receiving a search request, wherein the search request carries search data corresponding to target data to be searched and object data of a search object;

according to the search data and the object data, filtering a search area, and determining a target search area for data search, the method comprises the following steps: determining a second authority range according to the object data, and matching the second authority range with the first authority range of each search area to obtain a target search area; the search area is constructed based on original data of a plurality of original systems, and each search area carries a first authority range;

based on the search data, performing data search on the target search area to obtain corresponding target index information;

determining corresponding target data based on the target index information, including: and receiving a detail selection instruction aiming at the target index information, calling the data authority in the original system corresponding to the search area according to the detail selection instruction to verify the second authority range, and outputting corresponding target data when the verification is successful.

2. The method of claim 1, further comprising:

acquiring each original data of each original system;

constructing a feature library corresponding to the original data based on each original data, wherein the feature library comprises at least one of an index library, a vector library and a relation library;

the filtering the search area according to the search data and the object data to determine the target search area for data search includes:

and filtering the search area of the feature library according to the search data and the object data to determine a target search area.

3. The method of claim 2, wherein the obtaining each raw data for each raw system comprises:

determining a data acquisition mode corresponding to each original system based on system parameters of each original system, wherein the data acquisition mode comprises at least one of data capture, interface acquisition and function registration acquisition;

and acquiring corresponding original data from each original system according to each determined data acquisition mode.

4. The method of claim 2, wherein constructing a feature library corresponding to each of the raw data based on the raw data comprises:

carrying out data transverse classification and/or longitudinal classification on each original data to obtain each classified original data; the horizontal classification is classified according to the hierarchical label of the original data, and the vertical classification is classified according to the data characteristics of the original data;

performing data preprocessing on each classified original data to generate at least one of corresponding index data, vector data and relationship data;

and constructing a feature library corresponding to the original data based on at least one of the index data, the vector data and the relation data.

5. The method of claim 4, wherein the performing data preprocessing on each of the classified raw data to obtain at least one of corresponding index data, vector data, and relationship data comprises:

performing word segmentation processing on each original data, and obtaining corresponding index features based on word segmentation results;

determining a classification label of each original data based on a classification result of each original data;

determining the similarity between the original data according to the classification labels;

obtaining the vector characteristics of each original data based on each original data;

determining an incidence relation between the original data according to the similarity between the original data and the vector characteristics;

and obtaining at least one item of corresponding index data, vector data and relation data based on each index feature, the vector feature and the incidence relation.

6. The method of claim 2, wherein constructing a feature library corresponding to each of the raw data based on the raw data comprises:

constructing a plurality of initial feature libraries;

and performing database storage on each original data according to the system type of the original system corresponding to each original data and the data type of each original data to obtain a plurality of feature databases corresponding to the original data.

7. The method according to any one of claims 1 to 6, wherein the filtering the search area according to the search data and the object data to determine a target search area for data search comprises:

determining search permissions for the search object based on the object data, the search permissions including organizational structure permissions and/or user group permissions;

and filtering a search area according to the search permission and the search data, and determining a target search area for data search.

8. The method according to any one of claims 1 to 6, wherein the filtering the search area according to the search data and the object data to determine the target search area comprises:

preprocessing the search data to obtain preprocessed search data;

and filtering the search area according to the object data and the preprocessed search data, and determining a target search area.

9. The method of claim 8, wherein the preprocessing the search data to obtain preprocessed search data comprises:

carrying out standardized conversion processing on the search data to obtain standardized search data;

performing intention identification conversion processing on the search data to generate corresponding intention identification information;

and obtaining preprocessed search data according to the standardized search data and the intention identification information.

10. The method according to any one of claims 1 to 6, wherein the filtering the search area according to the search data and the object data to determine the target search area comprises:

and according to the search data and the object data, filtering a target library and/or a target table and/or a target field, and determining a corresponding target search area.

11. The method according to any one of claims 1 to 6, wherein the performing a data search on the target search area based on the search data to obtain corresponding target index information includes:

and performing keyword search and/or semantic search and/or association relation search on the target search area based on the search data to obtain corresponding target index information.

12. The method according to any one of claims 1 to 6, wherein the performing a data search on the target search area based on the search data to obtain corresponding target index information includes:

respectively carrying out data search on target search areas of at least one of an index library, a vector library and a relational library based on the search data to obtain corresponding index sub-information;

and merging the index sub-information, and performing deduplication processing to obtain corresponding target index information.

13. The method according to any one of claims 1 to 6, further comprising:

acquiring target object data of a target object to be pushed;

determining the object class of each target object to be pushed based on each target object data;

and acquiring corresponding push data according to each object type, and pushing the push data to the target object.

14. The method according to any one of claims 1 to 6, further comprising:

acquiring push data to be pushed and target object data of a target object to be pushed, wherein the target object data comprises personal characteristic data and historical access data;

based on the target object data, carrying out group division on the target objects to be pushed to generate each object group;

establishing an incidence relation between the push data to be pushed according to the push data to be pushed;

and pushing the push data to be pushed according to any one of the personal characteristic data, the historical access data, the object group and the incidence relation.

15. The method of claim 14, further comprising:

obtaining feedback data of the target object to the push data, wherein the feedback data comprises a click rate and a reading completion rate;

and determining secondary push data corresponding to the target object based on the click rate, the reading completion rate and the push information of the push data, and pushing the secondary push data to the target object.

16. The method according to any one of claims 1 to 6, wherein the target data is a plurality of target data, and after determining the corresponding target data based on the target index information, the method further comprises:

determining a target weight corresponding to each target data according to the plurality of target data and/or the object data;

ranking the plurality of target data based on the target weights;

and pushing the sorted target data to the search object.

17. The method according to any one of claims 1 to 6, wherein after determining the corresponding target data based on the target index information, any one of the following is further included:

constructing a display page corresponding to the target data based on the display request of the search object to the target data, and displaying through a terminal interface;

and calling an original page of an original system based on the display request of the search object to the target data, and displaying the original page through the terminal interface.

18. The method according to claim 17, wherein the calling an original page of an original system based on the request for displaying the target data by the search object and displaying the original page through the terminal interface comprises:

calling a system interface of an original system corresponding to the target data based on the display request of the search object to the target data;

performing authority verification on the search object through the system interface;

and after the verification is determined to be passed, calling an original page of the original system, and displaying the original page through the terminal interface.

19. An apparatus for cross-system data acquisition, the apparatus comprising:

a search request receiving module, configured to receive a search request, where the search request carries search data corresponding to target data to be searched and object data of a search object;

a target search area determination module, configured to filter a search area according to the search data and the object data, and determine a target search area for data search, including: determining a second authority range according to the object data, and matching the second authority range with the first authority range of each search area to obtain a target search area; the search area is constructed based on original data of a plurality of original systems, and each search area carries a first authority range;

the target index information determining module is used for carrying out data search on the target search area based on the search data to obtain corresponding target index information;

a target data determination module, configured to determine corresponding target data based on the target index information, including: and receiving a detail selection instruction aiming at the target index information, calling the data authority in the original system corresponding to the search area according to the detail selection instruction to verify the second authority range, and outputting corresponding target data when the verification is successful.

20. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 18 when executing the computer program.

21. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 18.