CN112667286A - Searching method based on context of programming field environment - Google Patents

Searching method based on context of programming field environment Download PDF

Info

Publication number
CN112667286A
CN112667286A CN202011551429.1A CN202011551429A CN112667286A CN 112667286 A CN112667286 A CN 112667286A CN 202011551429 A CN202011551429 A CN 202011551429A CN 112667286 A CN112667286 A CN 112667286A
Authority
CN
China
Prior art keywords
programming
context information
information
context
environment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011551429.1A
Other languages
Chinese (zh)
Inventor
张智轶
许云剑
黄志球
陶传奇
周玉倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202011551429.1A priority Critical patent/CN112667286A/en
Publication of CN112667286A publication Critical patent/CN112667286A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a search method based on programming field environment context. The method comprises the following steps: acquiring context information of a programming site environment, including context information of a programmer, context information of a programming project and a task, programming time and context information of an environment; carrying out different preprocessing on the acquired context information of the original programming site environment aiming at the text language and the formal language and storing the preprocessed information; clustering the context information of the preprocessed programming site environment by using a K-means algorithm to obtain a semantic relation between the context information; performing hierarchical analysis on the context information of the preprocessed programming site environment by using a hierarchical clustering method to obtain an explicit association and an implicit association relation between the context information; the search is completed for the required programming requirements using the elastic search as the underlying data retrieval model. According to the invention, the extensive and various semantic relationships among the context information are deeply mined, so that the accurate recommendation of the codes can be realized.

Description

Searching method based on context of programming field environment
Technical Field
The invention belongs to the field of computers, relates to a data acquisition technology and an inference engine technology in a software development technology, and particularly relates to a search method based on context of a field programming environment.
Background
With the development of the internet and the popularity of open source software, the reuse of software and code becomes more and more important in improving the software development efficiency. Code search techniques have gained increasing research and attention. However, the current searching method cannot well search out the needed programming codes according to the requirements of the user, which brings inconvenience to the work of the user and consumes time; therefore, how to analyze the needed and well-adapted codes according to the habits and the completed projects of different users becomes a key problem.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the defects of the prior art, the invention provides a searching method based on the context of the programming field environment, which can provide the codes required by the user with high precision and bring better user experience.
The technical scheme is as follows: a search method based on programming field environment context comprises the following steps:
acquiring context information of a programming site environment, including context information of a programmer, context information of a programming project and a task, programming time and context information of an environment;
preprocessing and storing the acquired context information of the original programming site environment aiming at the text language;
clustering the context information of the preprocessed programming site environment by using a K-means algorithm to obtain a semantic relation between the context information;
performing hierarchical analysis on the context information of the preprocessed programming site environment by using a hierarchical clustering method to obtain an explicit association and an implicit association relation between the context information;
the search is completed for the required programming requirements using the elastic search as the underlying data retrieval model.
Has the advantages that: the invention extracts information for analysis according to questionnaire survey and ordinary programming information of the user and the field of the executed project, and can recommend codes which are required by the user and can meet the programming habits and the programming capability of the user by deeply mining wide and various semantic relations among context information.
Drawings
FIG. 1 is a flowchart of a search method based on programming field context according to an embodiment of the present invention.
Detailed Description
The technical scheme of the invention is further explained by combining the attached drawings.
Referring to fig. 1, the search method based on the programming field context provided by the present invention firstly obtains the required user information by using explicit obtaining, implicit obtaining and reasoning obtaining modes, and adopts different natural language processing and preprocessing aiming at the query condition of the user and the collected data such as the original context information, etc. aiming at the text language. And then, deeply mining wide and various semantic relations among the context information by using a K-means algorithm, and carrying out complete and unified fusion of semantic communication on the context information knowledge by combining a text potential semantic analysis technology and an entity link analysis technology. And then, mining the association relation between the context information by using a hierarchical clustering technology to find the explicit association and implicit association relation between the data, finding the rule and the characteristic of dynamic change of the context information, and forming a benign growth and evolution cycle by using a snowball rolling effect in the iterative multiplexing process of software knowledge to realize the acquisition, cleaning, organization and management of large heterogeneous data. Finally, the modeling searches for the required programming requirements using the elastic search as the underlying data retrieval engine. The method comprises the following specific steps:
and step 1, acquiring required information by using explicit acquisition, implicit acquisition and reasoning acquisition modes.
Firstly, determining required information, starting from three dimensions of people, projects and environments, and programming field environment context information comprises context information of programmers, context information of programming projects and tasks, programming time and context information of environments. More specifically, the context information of the programmer includes: current Integrated Development Environment (IDE) familiarity, familiarity with current projects, experience of programmers, programmers programming habits, social networks of programmers; the context information of the programming project and task comprises: the method comprises the steps of using command information currently, operating a module currently, describing methods, calling methods, item structures, task types, programming error suggestions, item descriptions, item types and historical recommendation information; the context information of the programming time and the programming environment includes: time information, version number of the project, programming location, interface elements used by the developer, interface elements of interest to the developer.
The method for acquiring various types of information can be as follows:
1) explicit acquisition: obtaining the familiarity of the current Integrated Development Environment (IDE) and the familiarity of the current project by adopting a questionnaire mode; the method comprises the following steps that project types, project descriptions, programming error suggestions and historical recommendation information are mainly obtained according to a mode of searching recorded documents, developed documents and log reports of software; the time information and the project version number can be obtained by means of user communication and/or document inquiry;
2) implicit acquisition: determining programming habits of programmers by analyzing past code documents, bug reports and other documents of users; for the current running module, the current use command, the method description, the programming place, the interface element used by the developer and the interface element concerned by the developer, information collection can be carried out by adopting implicit modes such as screen monitoring, mouse operation monitoring and the like;
3) inferential acquisition (inferenng): the programmer experience is obtained by crawling the recorded data of the user in the programming forum through a crawler and acquiring the social network of the user by analyzing and reasoning the association relationship of the social network; and analyzing the task type, the project structure and the method calling information by the requirements of the project, the design document and the code structure to deduce and induce the association relation between the parameters and the methods.
And preprocessing the query conditions of the user and the collected data such as the original context information. For a text language, word segmentation and keyword acquisition technologies are adopted for preprocessing, syntactic and semantic information of words, sentences and the like can be obtained through word segmentation, keywords are a vocabulary set for expressing text subject content and are a more brief abstract of a text, and the keywords acquired through the keyword acquisition technologies can quickly and roughly acquire the content of the document.
For the preprocessed information, the codes are vectorized by utilizing a bag-of-words model and are represented by utilizing a one-shot representation method, the numerical value of the words appearing in the word sequence is 1, and the numerical value of the words not appearing in the word sequence is 0. For example, for a document:
1 Chinese Nanjing Chinese
2 Tokyo Japan Chinese
the word bag is constructed as follows:
Chinese Nanjing Tokyo Japan
and calculating the numerical value of each word in the word bag according to a one-shot representation method for each code text to obtain a vector representation of the code text.
The word vector of Chinese Nanjing Chinese is:
Chinese Nanjing Tokyo Japan
1 1 0 0
the word vector for Tokyo Japan Chinese is:
Chinese Nanjing Tokyo Japan
1 0 1 1
for texts except codes, expressing and storing the texts by using an n-gram model, and for sentences containing n words, expressing the language model by using the following expressions: p (W1, W2, …, Wn) ═ P (W1) P (W2 | W1) P (W3 | W1, W2) … P (Wn | W1, W2, … Wn-1), P is the probability that the phrase is established, and a larger probability indicates a larger possibility that the phrase is established. The model is based on Markov assumptions, assuming that whether a target word in a sentence occurs depends only on the n words that occur before this word, in order to reduce computational complexity, n is typically 2 or n is 3.
And 2, carrying out clustering analysis on the preprocessed data by utilizing a K-means algorithm to obtain the semantic relation between the context information.
The K-means clustering algorithm (K-means clustering algorithm) is an iterative solution clustering analysis algorithm, and the steps of the algorithm are that K objects are randomly selected to serve as initial clustering centers, then the distance between each object and each seed clustering center is calculated, and each object is allocated to the nearest clustering center. The cluster centers and the objects assigned to them represent a cluster.
The invention deeply excavates the wide and various semantic relations among the context information by utilizing the K-means algorithm, and excavates the context text information with great topic relevance in the context by clustering the preprocessed information. Specifically, K pieces of context information related to a topic are selected as initial clustering centers, then the distance between each piece of context information and each center is calculated, each piece of context information is distributed to the clustering center closest to the context information, so that K pieces of clusters are obtained, and finally the context information far away from each clustering center (such as exceeding a preset distance threshold) is removed, so that the context information with a high topic relevance degree in the context is extracted.
And 3, performing hierarchical analysis on the context information of the preprocessed programming site environment by using a hierarchical clustering algorithm to obtain an explicit association and an implicit association relation between the context information.
Hierarchical Clustering divides a data set into clusterss one layer by one layer, and the clusterss generated by the next layer is based on the results of the previous layer. Hierarchical clustering algorithms generally fall into two categories: (1) divive hierarchical clustering: the method is also called top-down (top-down) hierarchical clustering, all objects at the beginning belong to one cluster, a certain cluster is divided into a plurality of clusters according to a certain criterion each time, and the steps are repeated until each object is one cluster; (2) agglomerative hierarchical clustering: each object is a cluster at the beginning, two closest clusters are merged to generate a new cluster according to a certain criterion each time, and the steps are repeated until all the objects belong to one cluster finally.
In the invention, the hierarchical clustering technology is utilized to mine the association relationship between the context information so as to discover the explicit association and the implicit association relationship between the data, wherein the explicit association refers to the direct association relationship, and the implicit association refers to the hidden association between the context information and the implicit association acquired by mining. For example, the gender of Zhang Sanqi is male, the explicit association between Zhang Sanqi and the gender is male, and according to the activities of Zhang Sanqi at ordinary times, the Zhang Sanqi and a game can be related together through mining, and the implicit association between Zhang Sanqi and the game is. The previous K-means is difficult to extract implicit context topic information, hierarchical clustering is used for processing the preprocessed context information by using aggregate hierarchical clustering, the text information is used as a class, 2 classes with the nearest distance are combined into one class, and the process is sequentially carried out until only K classes are left.
And 4, searching the required programming requirement by using an ElasticSearch search engine.
The ElasticSearch is a Lucene-based search server. It provides a distributed multi-user full-text search engine, which can conveniently make a large amount of data have the capabilities of searching, analyzing and exploring. It is based on the RESTful web interface, is developed by Java, is released as open source code under Apache licensing terms, and is an enterprise-level search engine.
The invention utilizes an elastic search as an underlying data retrieval engine, and the elastic search is an open source item. We use ElasticSearch for information retrieval using contextual topic information and/or keyword information data obtained by preprocessing and data mining as obtained above.
In summary, the invention creatively provides a method for searching based on the context of the programming field environment, and compared with other searching methods, the method can effectively recommend the needs and the appropriate codes for the user. The method has the advantages that the codes are searched by a plurality of users based on requirements, and the search precision calculation is carried out according to the standard that the codes which can be satisfied with the codes can be searched, so that the required codes can be provided for the users with higher precision, and compared with a search method constructed by RNN, the search precision is relatively improved by about 40%, and better user experience is brought.

Claims (6)

1. A search method based on programming field environment context is characterized by comprising the following steps:
acquiring context information of a programming site environment, including context information of a programmer, context information of a programming project and a task, programming time and context information of an environment;
preprocessing and storing the acquired context information of the original programming site environment aiming at the text language;
clustering the context information of the preprocessed programming site environment by using a K-means algorithm to obtain a semantic relation between the context information;
performing hierarchical analysis on the context information of the preprocessed programming site environment by using a hierarchical clustering method to obtain an explicit association and an implicit association relation between the context information;
the search is completed for the required programming requirements using the elastic search as the underlying data retrieval model.
2. The context-based search method for programming field environments of claim 1, wherein the context information of the programmer comprises: current integrated development environment familiarity, familiarity of current projects, experience of programmers, programming habits of programmers, social networks of programmers; the context information of the programming items and tasks includes: the method comprises the steps of using command information currently, operating a module currently, describing methods, calling methods, item structures, task types, programming error suggestions, item descriptions, item types and historical recommendation information; the programming time and the context information of the programming environment include: time information, version number of the project, programming location, interface elements used by the developer, interface elements of interest to the developer.
3. The search method based on context of programming field environment of claim 2, wherein the obtaining of context information of programming field environment comprises:
obtaining the familiarity of the current integrated development environment and the familiarity of the current project by adopting a questionnaire mode;
acquiring project types, project descriptions, programming error suggestions and historical recommendation information according to a mode of searching record documents, development documents and log reports of software;
acquiring time information and a project version number in a user communication and/or document query mode;
determining programmer programming habits by analyzing historical code documents and bug report documents;
collecting a current running module, a current use command, method description, a programming place, interface elements used by a developer and interface elements concerned by the developer in a screen monitoring mode and a mouse operation monitoring mode;
crawling the recorded data of the user in the programming forum by using a crawler as experience information of the programmer; analyzing and reasoning by using the relationship of socializers to obtain the social network of the user;
and analyzing the requirements of the project, the design document and the code structure to obtain the task type, the project structure and the method calling information.
4. The programming field context based search method of claim 1, wherein said pre-processing and storing for a text language comprises: preprocessing a text language by adopting a word segmentation and keyword acquisition technology; representing and storing the preprocessed code information by utilizing a bag-of-words model and a one-shot representation method; the text information is represented by an n-gram model and stored, and the text information refers to other context information except code information.
5. The searching method based on the context of the programming field environment of claim 4, wherein the clustering analysis of the context information of the pre-processed programming field environment by using the K-means algorithm to obtain the semantic relationship between the context information comprises: selecting K pieces of context information related to a theme as initial clustering centers, calculating the distance between each piece of context information and each center, distributing each piece of context information to the clustering center closest to the context information, thereby obtaining K clusters, and removing the context information which is more than a preset threshold value from each clustering center, thereby extracting the context information with theme relevance in the context.
6. The searching method based on context of programming field environment of claim 4, wherein the performing hierarchical analysis on the context information of the pre-processed programming field environment by using the hierarchical clustering method to obtain the explicit association and the implicit association between the context information comprises: and processing the preprocessed context information by using hierarchical clustering, taking the text information as a class, synthesizing 2 classes closest to the text information into a class, and sequentially carrying out clustering division until the clustering division is finished.
CN202011551429.1A 2020-12-24 2020-12-24 Searching method based on context of programming field environment Pending CN112667286A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011551429.1A CN112667286A (en) 2020-12-24 2020-12-24 Searching method based on context of programming field environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011551429.1A CN112667286A (en) 2020-12-24 2020-12-24 Searching method based on context of programming field environment

Publications (1)

Publication Number Publication Date
CN112667286A true CN112667286A (en) 2021-04-16

Family

ID=75408388

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011551429.1A Pending CN112667286A (en) 2020-12-24 2020-12-24 Searching method based on context of programming field environment

Country Status (1)

Country Link
CN (1) CN112667286A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373507A1 (en) * 2016-02-03 2018-12-27 Cocycles System for generating functionality representation, indexing, searching, componentizing, and analyzing of source code in codebases and method thereof
CN109522011A (en) * 2018-10-17 2019-03-26 南京航空航天大学 A kind of code line recommended method of context depth perception live based on programming

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180373507A1 (en) * 2016-02-03 2018-12-27 Cocycles System for generating functionality representation, indexing, searching, componentizing, and analyzing of source code in codebases and method thereof
CN109522011A (en) * 2018-10-17 2019-03-26 南京航空航天大学 A kind of code line recommended method of context depth perception live based on programming

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘斌斌;董威;王戟;: "智能化的程序搜索与构造方法综述", 软件学报, no. 08, pages 2177 - 2197 *
杨君雯;王海;彭鑫;赵文耘;: "基于开发者行为分析的Web资源推荐", 计算机科学, no. 07, pages 147 - 150 *

Similar Documents

Publication Publication Date Title
CN101655857B (en) Method for mining data in construction regulation field based on associative regulation mining technology
CN108052659A (en) Searching method, device and electronic equipment based on artificial intelligence
WO2014107801A1 (en) Methods and apparatus for identifying concepts corresponding to input information
Li et al. Context-based diversification for keyword queries over XML data
Du et al. An approach for selecting seed URLs of focused crawler based on user-interest ontology
Huang et al. Multi-task learning for entity recommendation and document ranking in web search
Arasu et al. A grammar-based entity representation framework for data cleaning
Babur et al. Towards statistical comparison and analysis of models
CN103942204B (en) For excavating the method and apparatus being intended to
Han et al. Explainable artificial intelligence-based competitive factor identification
Liao et al. A vlHMM approach to context-aware search
CN112667286A (en) Searching method based on context of programming field environment
CN115292515A (en) Knowledge graph construction method in sewing equipment modular design field
Babur et al. Towards Distributed Model Analytics with Apache Spark.
CN110930189A (en) Personalized marketing method based on user behaviors
Khattak et al. Context-aware search in dynamic repositories of digital documents
ElGindy et al. Capturing place semantics on the geosocial web
CN106156259A (en) A kind of user behavior information displaying method and system
TABAK et al. Event-based summarization of news articles
Zhang et al. Facilitating Data-Centric Recommendation in Knowledge Graph
Nadim et al. A Comparative Assessment of Unsupervised Keyword Extraction Tools
Wei et al. Extraction Rule Language for Web Information Extraction and Integration
Samizadeh Graph-based Semantical Extractive Text Analysis
Mills et al. A comparative survey on NLP/U methodologies for processing multi-documents
Rezayi et al. A Framework for Knowledge-Derived Query Suggestions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination