CN108182181A - Repeated detection method for mass contribution merging request based on mixed similarity - Google Patents
Repeated detection method for mass contribution merging request based on mixed similarity Download PDFInfo
- Publication number
- CN108182181A CN108182181A CN201810100193.6A CN201810100193A CN108182181A CN 108182181 A CN108182181 A CN 108182181A CN 201810100193 A CN201810100193 A CN 201810100193A CN 108182181 A CN108182181 A CN 108182181A
- Authority
- CN
- China
- Prior art keywords
- contribution
- similarity
- request
- public
- merges
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
Abstract
The invention belongs to the field of software collaborative development and discloses a mass contribution merging request repeatability detection method based on mixed similarity. The method comprises the following steps: for the newly submitted public contribution merging request, firstly calculating the text similarity between the public contribution merging request and the historical public contribution merging request; then calculating the change similarity of the contribution of the history public with the history public; further collecting a group of historical repeated contribution data sets on a popular collaborative development platform, and under the training of the data sets, combining the two similarities by using a weight calculation method based on a greedy search strategy to calculate the mixed similarity between the popular contributions; and finally, obtaining a group of historical mass contribution merging request lists which are most likely to be repeated with the given mass contribution merging request according to the size of the mixed similarity value. The invention can detect the repeatability of the public contribution in time, avoid the repeated manual code examination work and improve the efficiency of the public contribution examination.
Description
Technical field
The invention belongs to software collaboration development fields, are related to a kind of public contribution based on hybrid similarity and merge request weight
Renaturation detection method.
Background technology
At open source community (such as GitHub), the software development model based on large-scale groups collaboration substantially increases software wound
New efficiency is excited in the production process that more and more developers join in open source software.However, this development mode is one
The process that parallel and nothing is uniformly coordinated is planted, when multiple developers spontaneously contribute code to same open source software project, such as
Fruit they want to realize same purpose, it is possible to the contribution of repetition can be submitted to merge request (i.e. Pull-request), especially
It is that those have attracted a large amount of peripheral developers, continuously receive community's contribution popular project be easier to occur it is this
Problem.As shown in Fig. 2, two developers Bob and Alice are cloned and (fork) same main version repository, so latter two exploitation
Person individually makes an amendment on respective local clone bank.When they want to realize same function or repair same code
During defect, since they are unaware of the work that other side doing, two people, which can may make corresponding modification and then submit, to be merged
It asks in main version repository, two merging requests of submission all can respectively undergo contribution examination and update operates, until there is certain position
Developer recognizes that the two repeat the presence that public contribution merges request.
It repeats public contribution merging to ask to cause the waste to platform resource, increases the maintenance cost of platform.Simultaneously
It also results in and the contribution for performing repetition is asked to check flow to repeating masses' contribution merging, this can expend examiner's additional time
And energy.Merge in the life cycle of request in a public contribution and (be submitted to platform from it and be accepted or rejected this to it
The section time), the public contribution repeated, which merges request, to be identified at any point in time, and more late identified, it is made
Into resource and wasted effort problem it is more serious.In addition, in a public contribution merges the checking process of request, contributor is also
Often it is updated according to the feedback of examiner it is perfect, therefore, if cannot identify that the public contribution of repetition is closed as early as possible
And ask, two contributors may also can do the work of repeated and redundant, and then throw doubt upon to the ability of the Executive Team of project.
Especially if it is considered as that the contribution submitted in an evening merges the repetition of request and examined that the contribution that they submit, which merges request,
The person of looking into closes, this negatively affects more serious caused by contributor.
At present GitHub platforms (GitHub be one towards increasing income and the hosted platform of privately owned software project because only propping up
Hold git as unique version library format carry out trustship, therefore named GitHub) on identification repeat contribution merge request mechanism be according to
Bad examiner artificially goes to find.However, for those popular projects, public developer is continuously toward key plate sheet
Code contribution is submitted in library, a large amount of contribution, which merges request, needs code inspection.Some examiner is allowed to remember all contributions to the history of
Merge the information of request, and merge request with the contribution newly submitted and be compared, the way for then judging repeatability is unrealistic
's.Under current mechanism, only when some developer recognize by chance two repetition contribution merge request presence, they
Repeatability is just found, this has resulted in most repetition contribution merging request and can not timely have been identified.Upper
It states under situation, it is very necessary that one can merge request presentation stage automatically to detect its repeated tool in contribution.
First, automatic prospecting tools are capable of the work of assisted review person, them is made to avoid doing the repeated work of redundancy.Secondly, when first
Between detect automatically repetition contribution merge request can allow both sides contributor establish as soon as possible contact and cooperate together, avoid him
Respectively continue to do the work of repetition again.
Invention content
In order to solve the above technical problems, the present invention propose it is a kind of in open source software project hosted platform there may be
Repeatability contribution based on hybrid similarity detection method, specific technical solution is as follows.
A kind of public contribution based on hybrid similarity merges the repeated detection method of request, includes the following steps:
The public contribution that S1, calculating are newly submitted, which merges, asks to merge the text similarity between request with history masses contribution,
The text similarity includes title text similarity and description text similarity;
The public contribution that S2, calculating are newly submitted, which merges, asks to merge the change similarity between request with history masses contribution,
The change similarity, which refers to that public contribution merges, asks to change the similarity of paths between file;
S3, one group of history repetition contribution merging requested data set is collected on Collaborative Development Platform, using based on greedy plan
Weight searching algorithm slightly repeats the history contribution merging requested data set and is trained, and obtains text similarity and change
The weighted value of similarity further calculates the hybrid similarity between public contribution merging request according to weighted value;
S4, according to the step S1 to step S3, each history masses, which contribute, merges that request is corresponding to obtain a mixing
Similarity is ranked up according to the size of hybrid similarity value, is obtained one group and is merged request repetition with the public contribution newly submitted
History masses contribute merge request list.
Further, the detailed process of the step S1 is:
S11, merge from the public contribution of the new submission ask to contribute to merge in request with the history masses to carry respectively
Title text and description text are taken, obtains two title texts and two description texts;
S12, title text and description text are pre-processed;
S13, the title text by pretreatment and description text are respectively converted into multi-C vector, obtain two heading-texts
Two description text vectors of this vector sum;
S14, the similarity between two title text vectors, i.e., the masses of described new submission are calculated using Cosine formula
Contribution merges the title text similarity that request merges request with history masses contribution;Two are calculated using Cosine formula
Similarity between text vector is described, i.e., the public contribution of described new submission merges request and merges with history masses contribution
The description text similarity of request.
Further, the detailed process of the step S2 is:
S21, the public contribution for extracting the new submission respectively merge request and are contributed with the history masses and merge request
The file of concrete modification, obtains two file sets;
S22, the similarity of paths between file two-by-two is calculated in two file sets, i.e., the public contribution newly submitted merges please
Seek the change similarity merged with history masses contribution between request.
Further, the Collaborative Development Platform is GitHub platforms.
Further, pretreatment is carried out to title text and description text in the step S12 and specifically includes participle, conversion
Root and removal stop words.
Compared with prior art, the invention has the advantages that:1st, the present invention is put down for open source software project trustship
In platform it is that may be present repeatability contribution, it is proposed that a kind of detection method based on hybrid similarity.This method is to improve code
One ring of key of efficiency is checked, can avoid reviewer repetition checks work, and core developer is helped more efficiently to organize generation
Code review process improves public contribution sink-efficiency.2nd, the present invention proposes comprehensive utilization and includes public contribution and merge request marking
Change similarity caused by text similarity and changed file including topic and text calculates public contribution and merges request
Between similarity, can preferably disclose public contribution and merge repeatability between request.3rd, the present invention passes through automatic identification and people
The mode of work inspection, which from GitHub platforms has collected one group of history and repeats public contribution, merges requested data set, the data and can
For automatic detection model being asked to optimize to repeating public contribution merging, its Effect on Detecting is improved.4th, the present invention proposes
Two kinds of similarities are carried out efficient combinations using based on the strategy that greed is searched for, can more react public contribution conjunction so as to calculate
And ask the hybrid similarity value of similarity.
Description of the drawings
Fig. 1 is the method for the present invention flow diagram;
Fig. 2 is the flow chart that multiple developers in background technology are contributed parallel;
The public contribution that Fig. 3 is the present invention merges request change similarity calculation algorithm routine code map;
Fig. 4 is the similarity of paths computational algorithm program code figure of two files;
Fig. 5 is the weight calculation algorithm routine code map based on greedy search strategy in the present invention;
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete
Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work
Embodiment shall fall within the protection scope of the present invention.
It is the method for the present invention flow diagram as shown in Figure 1;It is as follows:
The public contribution that S1, calculating are newly submitted, which merges, asks to merge the text similarity between request with history masses contribution,
The text similarity includes title text similarity and description text similarity.
For contributing the text for merging and being extracted in request header and description from public, the pretreated of standard is first carried out
Journey, including segmenting, converting root and removal stop words.One sentence is cut into phrase, and there are many strategies of the prior art
It may be used, this depends on data type to be processed and application field.There are some texts that can be split under common situation
Into multiple words, however it should integrally be regarded as a word, such as represent code in the public context for contributing and merging and asking
The text of path and hyperlink is generally all very long, but they refer to be a complete concept, therefore they should not
It is split and comes.Therefore, we used regular expression segmenter to parse urtext, and here is some regular expressions
And by its matched text.
Code path:
–\w+(:\:\:\w+)*
–“ActionDispatch::Http::URL”
Masses' contribution merges number of the request on GitHub platforms:
–\#\d+
–“#10319”
After text is segmented, each word can be converted into root-form (for example, " was " is converted to " be ",
" errors " is converted to " error "), this conversion is completed by Porter root transfer algorithms.Finally, some often occur
But the resolution of a sentence can be removed without the stop words of too big contribution (such as " the ", " a ").
Text by pretreatment can be according to TF-IDF models (Term Frequency-Inverse Document
Frequency, abridge TF-IDF) be further converted into can be at vector space model (Vector Space Model, VSM)
The multi-C vector of middle calculating, the text i of vectorization can be expressed as:TextVeci=(wI, 1, wI, 2..., wI, v), vectorial is every
An one-dimensional word for corresponding to text, what v was represented is the sum of word in entire text corpus.wI, kValue be that text i is corresponding
The weight of k-th of element in vector, the value are got by the calculating of TF-IDF models:
wI, k=tfI, k×idfI, k
In formula above, tfI, kIt represents word frequency, is the frequency that k-th word occurs in text i, idfI, kRepresent inverse text
Shelves frequency, for weighing discrimination of the word to article.
After text is quantified, we calculate two text vector TextVec using Cosine formulaiAnd TextVecj
Similarity SimText (i, j), specific formula for calculation is:
Based on Cosine calculation formula, the title text similarity between two public contribution merging requests is obtained respectively
SimTexttitle(i, j) and description text similarity SimTextdesc(i, j);I, j represent text, | | it represents to seek vector
Mould.
The public contribution that S2, calculating are newly submitted, which merges, asks to merge the change similarity between request with history masses contribution,
The change similarity, which refers to that public contribution merges, asks to change the similarity of paths between file;
Cooperation on GitHub platforms has the support of Git tools, when contributor carries in GitHub platforms dependent on Git
After a contribution is handed over to merge request, contribute involved modification that can be shown in the form of a kind of diff.To be based on diff
Information calculates the similarity that two public contributions merge request, and original diff data are resolved to structural data first, from
And extract a public contribution merges request concrete modification which module and which file.Specific code algorithm such as Fig. 3 institutes
Show, calculate the change similarity between two public contribution merging requests.The input of the algorithm is that two public contribution merging please
Seek and (represented in algorithm with PR) the file set files being respectively modifiediAnd filesj.The 1st row initializes in algorithmic code in Fig. 3
One list is used to storing the interim findings generated in algorithm, and the code of the 2nd row to the 5th row is for calculating two file sets
The file path similarity of middle any two file, and two files and its similarity are stored in list, and two files
File path similarity algorithm as shown in Figure 4 calculated.6th line code to the element in list according to similarity value into
Row sequence, the 7th line code determine the similarity that finally retain how many a files pair.Eighth row code be initialized one it is new
List, to store the file pair and its similarity value finally to be retained.The code of 9th row to the 13rd row from temporary table according to
The secondary file pair for finding out similarity maximum is simultaneously put into their similarity in final list, since same file is finally arranging
Table only occurs once, i.e., same file can only have maximum similarity value, therefore the 12nd line code meeting with another file
The file that the file in maximum similarity file pair and other files are formed in intermediate result list is to all deleting.Most
Afterwards, the similarity value in final list is added up afterwards divided by two is changed the maximum value of file set scale, and then obtain two
Masses' contribution merges the change similarity of request.Algorithm shown in Fig. 4 is used for calculating the similarity of paths of two files.First,
The path of two files is carried out cutting by the function according to path separators, respectively obtains two directory name set.Then pass through
3rd row to the 7th line code calculates the depth of the public subdirectory of longest of two file paths, finally with the depth divided by two texts
The maximum value of part pathdepth is the similarity of paths of two files.
S3, one group of history repetition contribution merging requested data set is collected on Collaborative Development Platform, using based on greedy plan
Weight searching algorithm slightly repeats the history contribution merging requested data set and is trained, and obtains text similarity and change
The weighted value of similarity further calculates the hybrid similarity between public contribution merging request according to weighted value;
History is collected from GitHub platforms repeat the detailed process that public contribution merges requested data set in the present embodiment
It is as follows:
(1) random sampling:Have chosen the popular project of 26, GitHub platforms;For a project, from its all public tribute
It offers in merging request and randomly selects out a part.
(2) artificial screening:Request is merged for each the masses' contribution being selected, its each of hand inspection includes
Other public contributions merge the comment that request is quoted, and further pick out and merge the repeated comment of request about public contribute,
The present invention contributes the comment for merging request repeatability to be referred to as indicative comment this about public.
(3) Rule Extraction:Based on the indicative comment set collected by previous step, it is found that commentator is pointing out one greatly
Crowd's contribution merging request and another masses' contribution merging request are when repetition, some word or expressions are frequently used.
For example " the dup of ", " closed by " and " addressed in " in several groups of following comments is often to be used by reviewer
To point out the phrase of repeatability.
–“dup of#xxxx”
–“Closed by https://github.com/rails/rails/pull/13867”
–“This has been addressed in#27768.”
Therefore, go out regular expression based on these indicative Opinions Extractions, be used for using these regular expressions as rule
The indicative comment of Auto-matching.It is listed below the example of a part of rule:
clos(e|ed|ing)(\w+){,5}(by|of)(\w+:){,5}#\d+
(4) automatic identification:According to above-mentioned regular expression recognition rule, indicative comment can be automatically identified, from
And it finds two mutually repeated public contributions and merges request.If a comment is identified as indicative comment, can be from this
The public contribution that extraction is cited inside a comment merges request number, merges request with the public contribution belonging to indicative comment
The repetition masses that partner, which contribute, merges request.
(5) hand inspection:The data of mistake can be introduced by carrying out automatic identification according to rule, i.e., there are some masses' contributions to close
And it asks to not being mutually to repeat.Therefore it needs to carry out hand inspection to the data of automatic identification.Based on hand inspection
Standard is:
1) author for repeating masses' contribution merging request is unaware of the presence that source masses contribute merging request.This is required from two
A public contribution, which merges, goes observation to judge whether author knows in the comment data of request.
2) examiners contribute the repeatability for merging request to be in agreement to public.I.e. an examiner proposes one big
It is after another masses' contribution merges the repetition of request, other examiner do not occur and hold opposition meaning that crowd's contribution, which merges request,
See, but illustrate that approving of and close one of public contribution merges request.
On the other hand, have been calculated given public contribution merge request merge with history masses contribution ask it is various types of
After the similarity of type, ask most like public contribution merging please to find out to merge with given masses' contribution using these similarities
Ask list.It is big to calculate two present invention employs the mode of hybrid similarity to make full use of the similarity of these types of type
Final similarity, the calculation formula of final similarity Sim (i, j) are as follows between crowd's contribution merging request:
Sim (i, j)=a × SimTexttitle(i, j)+
b×SimTextdesc(i, j)+
c×SimDifffile(i, j)
In formula above, Sim's (i, j) is hybrid similarity after being combined by a variety of Similarity-Weighteds, title
Text similarity SimTexttitle(i, j), description text similarity SimTextdesc(i, j), change similarity SimDifffile
(i, j), their corresponding weights are a, b, c respectively.To choose preferably weights, as shown in figure 5, the GitHub based on collection is put down
Platform repeats public contribution and merges requested data set, their numerical value is automatically determined using greedy searching algorithm.The algorithm
Input includes one group and repeats what is attempted when public contribution merging request set, algorithm iteration maximum times and algorithm are searched for every time
Step-length.Finally, which returns to the weight of one group of local optimum.In algorithmic code shown in Fig. 5, preceding 3 row (1-3 rows) code
Three weights are initialized, and weighted value is formed vector and is operated, then with initialize weight vectors come
Obtain initial valuation functions value.Can valuation functions preferably reflect various types of phases for assessing one group of weight vectors
Merge the contribution degree of request similitude to practical public contribution like degree, i.e., can one group of weight vectors, which generate, more tallies with the actual situation
Similarity proportion.One public contribution is merged for request, in the list of return, the public contribution repeated with it is closed
And ask sequence more more forward better, therefore valuation functions fitness is defined as:
What DupPR was represented in above formula is that history repeats public contribution and merges requested data set, wts represent current weight to
Amount,<pre,prl>Represent that a pair of public contribution repeated merges request, what SimPRs (Prl) was returned is the most similar with prl
One group of masses, which contributes, merges request list, Rank (pre, SimPRs (Prl)) return the result is that preSequence in lists.
Fig. 5 4-21 line codes iteratively search for the better weight parameter of effect, and until iterations, to have reached algorithm defeated
The maximum iteration specified in entering.In the 5th line code, we create first a list for store iteration each time
In search history record.In each iteration, we have tried weight vectors are changed from both direction:Sweep forward
(7-10 line codes) and reverse search (11-14 line codes).When each search starts, current optimal weights vector is first
Preservation (the 7th row and the 12nd line code) is recorded, in sweep forward, that be observed weight can increase the list of a step-length
Position (eighth row code), and in reverse search, that be observed weight can reduce unit (the 12nd row generation of a step-length
Code).Weight vectors after being updated be used to calculate new valuation functions value (the 9th row and the 13rd line code), at the same time, newly
Weight vectors be also recorded historical record search_history (the 10th row and the 14th line code).When all power
After weight is all observed, i.e. a, b, tri- weights of c are observed one time, highest function evaluation value can be taken out from search history
(the 15th line code), if this value is more taller than the valuation functions value of current optimal weights vector, then current optimal weights
Vector and optimum evaluation functional value correspondingly can all be updated (the 16th the-the 19 row of row), then next otherwise without update
The iteration of wheel starts.Finally, the weight vectors (the 23rd row) that the output of algorithm shown in Fig. 5 behaves oneself best.
S4, according to the step S1 to step S3, each history masses, which contribute, merges that request is corresponding to obtain a mixing
Similarity is ranked up according to the size of hybrid similarity value, is obtained one group and is merged request repetition with the public contribution newly submitted
History masses contribute merge request list.A top-k value can be preset in embodiment, takes in list preceding top-k
History masses, which contribute, merges request, and the most like public contribution that the public contribution as newly submitted merges request merges request.
In conclusion the public contribution proposed by the present invention based on hybrid similarity merges the repeated detection method energy of request
The repeatability of enough public contributions of detection in time, avoids generating the work of repeater's work code inspection, improves what public contribution examined
Efficiency.
It although an embodiment of the present invention has been shown and described, for the ordinary skill in the art, can be with
Understanding without departing from the principles and spirit of the present invention can carry out these embodiments a variety of variations, modification, replace
And modification, the scope of the present invention is defined by the appended.
Claims (5)
1. a kind of public contribution based on hybrid similarity merges the repeated detection method of request, which is characterized in that including following
Step:
The public contribution that S1, calculating are newly submitted, which merges, asks to merge the text similarity between request with history masses contribution, described
Text similarity includes title text similarity and description text similarity;
The public contribution that S2, calculating are newly submitted, which merges, asks to merge the change similarity between request with history masses contribution, described
Change similarity, which refers to that public contribution merges, asks to change the similarity of paths between file;
S3, one group of history repetition contribution merging requested data set is collected on Collaborative Development Platform, using based on Greedy strategy
Weight searching algorithm repeats the history contribution merging requested data set and is trained, and obtains text similarity and becomes more like
The weighted value of degree further calculates the hybrid similarity between public contribution merging request according to weighted value;
S4, according to the step S1 to step S3, each history masses, which contribute, to be merged request corresponding to obtain a mixing similar
Degree, is ranked up according to the size of hybrid similarity value, obtains one group and merges going through for request repetition with the public contribution newly submitted
History masses, which contribute, merges request list.
2. a kind of public contribution of base hybrid similarity as described in claim 1 merges the repeated detection method of request, special
Sign is that the detailed process of the step S1 is:
S11, merge from the public contribution of the new submission ask to merge to extract in request with history masses contribution to mark respectively
Text and description text are inscribed, obtains two title texts and two description texts;
S12, title text and description text are pre-processed;
S13, will by pretreatment title text and description text be respectively converted into multi-C vector, obtain two title texts to
Amount and two description text vectors;
S14, the similarity between two title text vectors, i.e., the public contribution of described new submission are calculated using Cosine formula
Merge the title text similarity that request merges request with history masses contribution;Two descriptions are calculated using Cosine formula
Similarity between text vector, i.e., the public contribution of described new submission merge request and merge request with history masses contribution
Description text similarity.
3. a kind of public contribution of base hybrid similarity as described in claim 1 merges the repeated detection method of request, special
Sign is that the detailed process of the step S2 is:
S21, the public contribution for extracting the new submission respectively merge request and merge with history masses contribution and ask specifically
The file having modified obtains two file sets;
S22, calculate in two file sets the similarity of paths between file two-by-two, i.e., the public contribution newly submitted merge request with
The history masses contribute the change similarity merged between request.
4. a kind of public contribution of base hybrid similarity as described in claim 1 merges the repeated detection method of request, special
Sign is:The Collaborative Development Platform is GitHub platforms.
5. a kind of public contribution of base hybrid similarity as claimed in claim 2 merges the repeated detection method of request, special
Sign is that carrying out pretreatment to title text and description text in the step S12 specifically includes participle, conversion root and removal
Stop words.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810100193.6A CN108182181B (en) | 2018-02-01 | 2018-02-01 | Repeated detection method for mass contribution merging request based on mixed similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810100193.6A CN108182181B (en) | 2018-02-01 | 2018-02-01 | Repeated detection method for mass contribution merging request based on mixed similarity |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108182181A true CN108182181A (en) | 2018-06-19 |
CN108182181B CN108182181B (en) | 2021-03-26 |
Family
ID=62551963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810100193.6A Active CN108182181B (en) | 2018-02-01 | 2018-02-01 | Repeated detection method for mass contribution merging request based on mixed similarity |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108182181B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359292A (en) * | 2018-08-31 | 2019-02-19 | 大连诺道认知医学技术有限公司 | Medical literature screening technique and device |
CN111310834A (en) * | 2020-02-19 | 2020-06-19 | 深圳市商汤科技有限公司 | Data processing method and device, processor, electronic equipment and storage medium |
CN113379271A (en) * | 2021-06-22 | 2021-09-10 | 中国人民解放军国防科技大学 | Open source platform-oriented abandoned contribution takeover recommendation method, device and equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060080356A1 (en) * | 2004-10-13 | 2006-04-13 | Microsoft Corporation | System and method for inferring similarities between media objects |
CN103488707A (en) * | 2013-09-06 | 2014-01-01 | 中国人民解放军国防科学技术大学 | Method of searching for candidate classes based on greedy strategy and heuristic algorithm |
CN104331342A (en) * | 2014-01-06 | 2015-02-04 | 广州三星通信技术研究有限公司 | Method for file path matching and the device thereof |
US20150156232A1 (en) * | 2013-12-02 | 2015-06-04 | Pankaj Sharma | System and method for generating and merging activity-entry reports utilizing activity-entry hierarchy and hierarchical information of the activity-entries |
CN105389330A (en) * | 2015-09-21 | 2016-03-09 | 中国人民解放军国防科学技术大学 | Cross-community matched correlation method for open source resources |
CN105955937A (en) * | 2016-05-16 | 2016-09-21 | 浪潮电子信息产业股份有限公司 | Method to compare the iso differences and similarities of the linux system discs |
-
2018
- 2018-02-01 CN CN201810100193.6A patent/CN108182181B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060080356A1 (en) * | 2004-10-13 | 2006-04-13 | Microsoft Corporation | System and method for inferring similarities between media objects |
CN103488707A (en) * | 2013-09-06 | 2014-01-01 | 中国人民解放军国防科学技术大学 | Method of searching for candidate classes based on greedy strategy and heuristic algorithm |
US20150156232A1 (en) * | 2013-12-02 | 2015-06-04 | Pankaj Sharma | System and method for generating and merging activity-entry reports utilizing activity-entry hierarchy and hierarchical information of the activity-entries |
CN104331342A (en) * | 2014-01-06 | 2015-02-04 | 广州三星通信技术研究有限公司 | Method for file path matching and the device thereof |
CN105389330A (en) * | 2015-09-21 | 2016-03-09 | 中国人民解放军国防科学技术大学 | Cross-community matched correlation method for open source resources |
CN105955937A (en) * | 2016-05-16 | 2016-09-21 | 浪潮电子信息产业股份有限公司 | Method to compare the iso differences and similarities of the linux system discs |
Non-Patent Citations (2)
Title |
---|
ZHIXING LI等: "Detecting Duplicate Pull-requests in GitHub", 《INTERNETWARE"17,SEPTEMBER 23,2017》 * |
何力等: "大规模层次分类中的候选类别搜索", 《计算机学报》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109359292A (en) * | 2018-08-31 | 2019-02-19 | 大连诺道认知医学技术有限公司 | Medical literature screening technique and device |
CN109359292B (en) * | 2018-08-31 | 2023-04-07 | 大连诺道认知医学技术有限公司 | Medical literature screening method and device |
CN111310834A (en) * | 2020-02-19 | 2020-06-19 | 深圳市商汤科技有限公司 | Data processing method and device, processor, electronic equipment and storage medium |
CN113379271A (en) * | 2021-06-22 | 2021-09-10 | 中国人民解放军国防科技大学 | Open source platform-oriented abandoned contribution takeover recommendation method, device and equipment |
Also Published As
Publication number | Publication date |
---|---|
CN108182181B (en) | 2021-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116628172B (en) | Dialogue method for multi-strategy fusion in government service field based on knowledge graph | |
Xia et al. | Tag recommendation in software information sites | |
Benachio et al. | Interactions between lean construction principles and circular economy practices for the construction industry | |
CN105393265A (en) | Active featuring in computer-human interactive learning | |
CN105740227B (en) | A kind of genetic simulated annealing method of neologisms in solution Chinese word segmentation | |
CN108182181A (en) | Repeated detection method for mass contribution merging request based on mixed similarity | |
CN112328800A (en) | System and method for automatically generating programming specification question answers | |
CN103838857A (en) | Automatic service combination system and method based on semantics | |
CN105335510A (en) | Text data efficient searching method | |
CN115390806A (en) | Software design mode recommendation method based on bimodal joint modeling | |
CN105160046A (en) | Text-based data retrieval method | |
Mustafa et al. | Optimizing document classification: Unleashing the power of genetic algorithms | |
CN113010771A (en) | Training method and device for personalized semantic vector model in search engine | |
Kardoost et al. | Devranker: an effective approach to rank developers for bug report assignment | |
CN117112794A (en) | Knowledge enhancement-based multi-granularity government service item recommendation method | |
Secer et al. | Ontology mapping using bipartite graph | |
Algosaibi et al. | Using the semantics inherent in sitemaps to learn ontologies | |
CN114461813A (en) | Data pushing method, system and storage medium based on knowledge graph | |
Eibeck et al. | A simple and efficient approach to unsupervised instance matching and its application to linked data of power plants | |
Li et al. | Machine learning methodology for enhancing automated process in IT incident management | |
Chetoui et al. | Course recommendation model based on Knowledge Graph Embedding | |
Sathyan et al. | Two-Layered Machine Learning Approach for Sentiment Analysis of tweets related to Electric Vehicles | |
CN116991877B (en) | Method, device and application for generating structured query statement | |
CN103020206A (en) | Knowledge-network-based search result focusing system and focusing method | |
Yasmin et al. | Developing a framework for potential candidate selection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |