Combinatorial Algorithmsfor Nearest Neighbors, Near-Duplicatesand Small-World DesignYury LifshitsYahoo! ResearchShengyu ZhangThe Chinese University of Hong KongSODA 2009Yury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors1 / 29MostsimilarQuery: New objectTask: Find the mostsimilar one in the datasetSimilarity Search: an ExampleInput: Set of objectsTask: Preprocess itYury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors2 / 29MostsimilarSimilarity Search: an ExampleInput: Set of objectsTask: Preprocess itQuery: New objectTask: Find the mostsimilar one in the datasetYury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors2 / 29Similarity Search: an ExampleInput: Set of objectsTask: Preprocess itMostsimilarQuery: New objectTask: Find the mostsimilar one in the datasetYury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors2 / 29Similarity SearchSearch space: object domain U, similarityfunction σInput: database S = {p1, . . . , pn} ⊆ UQuery: q ∈ UTask: find argmax σ(pi, q)Yury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors3 / 29Nearest Neighbors in TheorySphere Rectangle Tree Orchard’s Algorithm k-d-B treeGeometric near-neighbor access tree Excludedmiddle vantage point forest mvp-tree Fixed-heightfixed-queries tree AESA Vantage-pointtree LAESA R∗-tree Burkhard-Keller tree BBD treeNavigating Nets Voronoi tree Balanced aspect ratiotree Metric tree vps-treeM-treeLocality-Sensitive Hashing SS-treeR-tree Spatial approximation treeMulti-vantage point tree Bisector tree mb-tree Covertree Hybrid tree Generalized hyperplane tree Slim treeSpill Tree Fixed queries tree X-tree k-d tree BalltreeQuadtree Octree Post-office treeYury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors4 / 29Typical web dataset has separation effectFor almost all i, j :1/2 ≤ d(pi, pj) ≤ 1Classic methods fail:Branch and bound algorithms visit every objectDoubling dimension is at least log n/2Revision: Basic AssumptionsIn theory:Triangle inequalityDoubling dimension is o(log n)Yury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors5 / 29Classic methods fail:Branch and bound algorithms visit every objectDoubling dimension is at least log n/2Revision: Basic AssumptionsIn theory:Triangle inequalityDoubling dimension is o(log n)Typical web dataset has separation effectFor almost all i, j :1/2 ≤ d(pi, pj) ≤ 1Yury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors5 / 29Revision: Basic AssumptionsIn theory:Triangle inequalityDoubling dimension is o(log n)Typical web dataset has separation effectFor almost all i, j :1/2 ≤ d(pi, pj) ≤ 1Classic methods fail:Branch and bound algorithms visit every objectDoubling dimension is at least log n/2Yury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors5 / 29This work:Better nearest neighbor searchDetecting near-duplicatesNavigability design for small worldsContributionNavin Goyal, YL, Hinrich Schütze, WSDM 2008:Combinatorial framework: new approach to datamining problems that does not require triangleinequalityNearest neighbor algorithmYury Lifshits, Shengyu Zhang()Combinatorial Nearest Neighbors6 / 29Document Outline
- Combinatorial Framework
- New Algorithms
- Combinatorial Nets
- Directions for Further Research
Add New Comment