CSE 450Web Mining SeminarSpring 2008MWF 11:1012:00pm Maginnes 113Instructor: Dr. Brian D. DavisonDept. of Computer Science & EngineeringLehigh Universitydavison@cse.lehigh.eduhttp://www.cse.lehigh.edu/~brian/course/webmining/Course Objectives To gain a background in web mining techniques To become proficient at reading technical papers To gain knowledge of important current web mining research To gain experience presenting technical material To learn to write critical reviews of research papers To explore a research project in some depth and write and present a technical paper summarizing that workSpring 2008Web Mining Seminar2Teaching materials Required Text: Web Data Mining: Exploring Hyperlinks, Contents and Usage data. By Bing Liu, Springer, ISBN 3-450-37881-2. Optional Text: Data Mining: Practical Machine Learning Tools and Techniques, 2nd Ed. By Witten and Frank, Morgan Kaufmann Papers: Most (perhaps al ) available onlineAuthor's homepages Citeseer/ResearchIndex Google ScholarACM Digital LibraryIEEExploreSpring 2008Web Mining Seminar3Seminars are less formal We have a small class Introduce yourselves!Spring 2008Web Mining Seminar4Introduction toWeb MiningSpring 2008Web Mining Seminar5What is data mining? Data mining is also called knowledge discovery and data mining (KDD) Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web, images, etc. Patterns must be: valid, novel, potential y useful, understandableSpring 2008Web Mining Seminar6Classic data mining tasks Classification:mining patterns that can classify future (new) data into known classes. Association rule miningmining any rule of the form X → Y, where X and Y are sets of data items. E.g., Cheese, Milk→ Bread [sup =5%, confid=80%] Clusteringidentifying a set of similarity groups in the data Sequential pattern mining:A sequential rule: A→ B, says that event A will be immediately followed by event B with a certain confidenceSpring 2008Web Mining Seminar7What is web mining? The process of discovering knowledge from web page content, hyperlink structure, and usage data Builds on existing data and text mining techniques, but adds many new tasks and algorithms Three types, based on sources of data (often combined in practice): Web structure mining Web content mining Web usage miningSpring 2008Web Mining Seminar8Importance of web data miningThe web is unique! Amount of information is huge and stil growing, on almost any topic, and changes continuously No single editorial control: significant variations in quality, much duplication, and data formats vary widely Significant information is linked (within and between web sites) Web reflects a virtual society --- interactions among people, organizations, and automated systems, no longer limited by geographyThe Web presents chal enges and opportunities for miningSpring 2008Web Mining Seminar9Importance of web data mining Online organizations generate a huge amount of data How to make best use of data? Knowledge discovered from web data can be used for competitive advantage. Online retailers (e.g., amazon.com) are largely driven by data mining. Web search engines are information retrieval (text mining) and data mining companies Web surfers/searchers need tools to find, recommend, organize, and extract useful information from the WebSpring 2008Web Mining Seminar10
Add New Comment