'$Spatial Data MiningPusheng ZhangShashi ShekharDepartment of Computer Science and EngineeringUniversity of MinnesotaSea Surface Temperature (SST) in March, 1982&%'$Why Data Mining?Lots of Data are Being Collected• Business Applications:– Transactions: retail, bank ATM, air travel, etc– Web logs, e-commerce, GPS-track• Scientific Applications:– Remote sensing: e.g., NASA’s Earth Observing System– Sky survey– Microarrays generating gene expression dataChallenges:• Volume (data)number of human analysts• Some automation neededData Mining may help!• Provide better and custmized insights for business• Help scientists for hypothesis generation&%Spatial Data Mining: Accomplishments and Research Needs1'$Spatial DataLocation-based Services• Ex: MapQuest, Yahoo Maps, Google Maps, MapPointFigure 1: Google Local Search (http://maps.google.com)In-car Navigation DeviceFigure 2: Emerson In-Car Navigation System (In Coutesy of Amazon.com)&%Spatial Data Mining: Accomplishments and Research Needs2'$Spatial Data Mining (SDM)The process of discovering• interesting, useful, non-trivial patterns– patterns: non-specialist– exception to patterns: specialist• from large spatial datasetsSpatial patterns• Spatial outlier, discontinuities– bad traffic sensors on highways (DOT)• Location prediction models– model to identify habitat of endangered species• Spatial clusters– crime hot-spots (NIJ), cancer clusters (CDC)• Co-location patterns– predator-prey species, symbiosis– Dental health and fluoride&%Spatial Data Mining: Accomplishments and Research Needs3'$Location As AttributeLocation as attribute in spatial data miningWhat value is location as an explanatory variable?• most events are associated with space and time• surrogate variable• critical to data analyses for many application domains– physical science– social scienceLocation helps bring rich contexts• Physical: e.g., rainfall, temperature, and wind• Demographical: e.g., age group, gender, and income type• Problem-specificLocation helps bring relationships• e.g., distance to open water&%Spatial Data Mining: Accomplishments and Research Needs4'$Example Spatial Pattern: Spatial ClusterThe 1854 Asiatic Cholera in London&%Spatial Data Mining: Accomplishments and Research Needs5'$Example Spatial Pattern: Spatial OutliersSpatial Outliers• Traffic Data in Twin Cities• Abnormal Sensor Detections• Spatial and Temporal OutliersAverage Traffic Volume(Time v.s. Station)180160101402012010030804060I35W Station ID(South Bound)40502060050100150200250Time&%Spatial Data Mining: Accomplishments and Research Needs6'$Example Spatial Pattern: Predictive ModelsLocation Prediction: Bird Habitat Prediction• Given training data• Predictive model building• Predict new dataNest sites for 1995 Darr location010203040Marsh landNest sites50607080020406080100120140160nz = 85&%Spatial Data Mining: Accomplishments and Research Needs7'$Example Spatial Pattern: Co-locations(backup)Given:• A collection of different types of spatial eventsIllustrationFind: Co-located subsets of event types&%Spatial Data Mining: Accomplishments and Research Needs8'$What’s NOT Spatial Data MiningSimple Querying of Spatial Data• Find neighbors of Canada given names and boundaries ofall countries• Find shortest path from Boston to Houston in a freewaymap• Search space is not large (not exponential)Testing a hypothesis via a primary data analysis• Ex. Female chimpanzee territories are smaller than maleterritories• Search space is not large !• SDM: secondary data analysis to generate multiple plau-sible hypothesesUninteresting or obvious patterns in spatial data• Heavy rainfall in Minneapolis is correlated with heavyrainfall in St. Paul, Given that the two cities are 10 milesapart.• Common knowledge: Nearby places have similar rainfallMining of non-spatial data• Diaper sales and beer sales are correlated in evening&%Spatial Data Mining: Accomplishments and Research Needs9
Add New Comment