2024-03-28T10:49:33Z
https://repository.nii.ac.jp/oai
oai:repository.nii.ac.jp:00001226
2023-01-05T01:06:24Z
136
NII Technical Report (NII-2006-008E):A Generic Query-Based Model for Scalable Clustering
Houle, Michael E.
テクニカルレポート
Technical Report
This paper presents a generic model for clustering that requires no direct knowledge of the nature or representation of the data. In lieu of such knowledge, the relevant-set clustering (RSC) model relies solely on the existence of an oracle that accepts a query in the form of a data item, and returns a ranked set of items relevant to the query. In principle, the role of the oracle could be played by any similarity search structure, or even a commercial search engine whose ranking function and relevancy scores are kept secret. The quality of cluster candidates, the degree of association between pairs of cluster candidates, and the degree of association between clusters and data items are all assessed according to the statistical significance of a form of correlation among pairs of relevant sets and/or candidate cluster sets. A scalable clustering heuristic based on the RSC model is also presented, and demonstrated for very large, high-dimensional datasets using a fast approximate similarity search structure as the oracle.
国立情報学研究所
2006-05-19
eng
departmental bulletin paper
https://doi.org/10.20736/0000001226
https://repository.nii.ac.jp/records/1226
10.20736/0000001226
1346-5597
NIIテクニカル・レポート
NII Technical Report
1
21
https://repository.nii.ac.jp/record/1226/files/06-008E.pdf
application/pdf
304.1 kB
2019-03-12