This is not the document you are looking for? Use the search form below to find more!

Report home > Others

Machine Learning and Data Mining: 19 Mining Text And Web Data

0.00 (0 votes)
Document Description
Machine Learning and Data Mining: 19 Mining Text And Web Data
File Details
Submitter
  • Name: eliasz
Embed Code:

Add New Comment




Related Documents

Traffic Accident Analysis Using Machine Learning Paradigms

by: shinta, 10 pages

Engineers and researchers in the automobile industry have tried to design and build safer automobiles, but traffic accidents are unavoidable. Patterns involved in dangerous crashes could be ...

Focusing on the relation: fewer exemplars facilitate children's initial verb learning and extension

by: samanta, 7 pages

One of the most prominent theories for why children struggle to learn verbs is that verb learning requires the abstraction of relations between an object and its action (Gentner, 2003). Two ...

Practical Machine Learning

by: holly, 33 pages

Practical Machine Learning

Machine Learning in R

by: isabel, 151 pages

Machine Learning in R

Machine learning Lecture 2

by: nayu, 86 pages

Machine learning Lecture 2

Introduction to Personal Learning and Thinking Skills and Functional Skills

by: ishaan, 28 pages

Functional and Personal Learning and Thinking Skills - supporting learning John Pallister 26 th January 2009 By the end of the session you will: know what Functional ...

User-Oriented Machine Learning Strategies for Information Extraction : Putting the Human Back in the Loog

by: shinta, 2 pages

Efforts in information extraction (IE) have concen- trated on fundamental issues concerning the viability of the technology. Two of these important issues are scalability and ...

Linking Project-Based Interdisciplinary Learning And Recommended Professional Competencies With Business Management, Digital Media, Distance Learning, Engineering Technology, And English

by: samanta, 8 pages

This paper encourages the investigation of real world problems by students and faculty and links recommended student competencies with project based learning. In addition to the traditional course ...

Griot's Garage 11316Z 3 Machine Polish And Wax Kit With Bag

by: hannah195distefano, 1 pages

polisher, 16 ounces of machine polish, a 3-inch orange foam polishing pad, a set of three microfiber polish removal

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems) by Eibe Frank

by: gyuszi, 2 pages

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems) by Eibe Frank ...

Content Preview
Text and Web MiningMachine Learning and Data Mining (Unit 19)Prof. Pier Luca LanziReferences2Jiawei Han and Micheline Kamber, "Data Mining: Concepts and Techniques", The Morgan Kaufmann Series in Data Management Systems (Second Edition)Chapter 10, part 2Web Mining Course by Gregory-Platesky Shapiro available atwww.kdnuggets.comProf. Pier Luca LanziMining Text Data: An Introduction3Data Mining / Knowledge DiscoveryStructured Data Multimedia Free TextHypertextHomeLoan (Frank Rizzo bought<a href>Frank RizzoLoanee: Frank Rizzohis home from Lake</a> BoughtLender: MWFView Real Estate in<a hef>this home</a>Agency: Lake View1992.from <a href>LakeAmount: $200,000He paid $200,000View Real Estate</a>Term:15 yearsunder a15-year loanIn <b>1992</b>.)Loans($200K,[map],...)from MW Financial.<p>...Prof. Pier Luca LanziBag-of-Tokens Approaches4DocumentsToken SetsFour score and seven nation – 5years ago our fathers brought civil - 1forth on this continent, a new war – 2nation, conceived in Liberty, Featuremen – 2and dedicated to the Extractiondied – 4proposition that all men are people – 5created equal.Liberty – 1Now we are engaged in a God – 1great civil war, testing …whether that nation, or …Loses all order-specific information!Severely limits context!Prof. Pier Luca LanziNatural Language Processing5A dog is chasing a boy on the playground LexicalDetNoun AuxVerbDet Noun Prep DetNounanalysis(part-of-speechtagging)Noun PhraseNoun PhraseComplex VerbNoun PhrasePrep PhraseSemantic analysisVerb PhraseSyntactic analysisDog(d1).(Parsing)Boy(b1).Verb PhrasePlayground(p1).Chasing(d1,b1,p1).Sentence+Scared(x) if Chasing(_,x,_).A person saying this maybe reminding another person toget the dog back…Scared(b1)InferencePragmatic analysis(speech act)(Taken from ChengXiang ZhaProf. Pier Luca Lanzi, CS 397cxz – Fall 2003) iGeneral NLP—Too Difficult!6Word-level ambiguity “design” can be a noun or a verb (Ambiguous POS) “root” has multiple meanings (Ambiguous sense)Syntactic ambiguity“natural language processing” (Modification)“A man saw a boy with a telescope.” (PP Attachment)Anaphora resolution“John persuaded Bill to buy a TV for himself.”(himself = John or Bill?)Presupposition“He has quit smoking.” implies that he smoked before.Humans rely on context to interpret (when possible).This context may extend beyond a given document!(Taken from ChengXiang Zhai, CS 397cxz – Fall 2003)Prof. Pier Luca LanziShallow Linguistics7English LexiconPart-of-Speech TaggingWord Sense DisambiguationPhrase Detection / ParsingProf. Pier Luca LanziWordNet8An extensive lexical network for the English languageContains over 138,838 words.Several graphs, one for each part-of-speech.Synsets (synonym sets), each defining a semantic sense.Relationship information (antonym, hyponym, meronym …)Downloadable for free (UNIX, Windows)Expanding to other languages (Global WordNet Association)Funded >$3 million, mainly government (translation interest)Founder George Miller, National Medal of Science, 1991.wateryparchedmoistwetdryaridsynonymantonymdampanhydrousProf. Pier Luca LanziPart-of-Speech Tagging9Training data (Annotated text)This sentence serves as an example of annotated text…DetN V1 P DetN P V2 NThis is a new sentence.“This is a new sentence.”POS TaggerDet Aux Det AdjNPick the (p w ,..., w ,most likely t ,...,t )tag sequence.1k1k⎧ (p t | w )... (p t | w ) (p w )... (p w )11kk1k⎪p(w ,..., w ,t ,...,t )k= ⎨1k1k∏ (pw |t ) (pt |t )⎪Independent assignment−⎧iiii 1p(t | w )...p(t | w ) p(⎩ wMost common tagi 1= )...p(w )11kk1k⎪ k= ⎨∏p(w |t ) (pt |t )⎪iiii 1−⎩Partial dependencyi 1=(HMM)Prof. Pier Luca LanziWord Sense Disambiguation10?“The difficulties of computational linguistics are rooted in ambiguity.”N Aux V P NSupervised LearningFeatures:Neighboring POS tags (N Aux V P N)Neighboring words (linguistics are rooted in ambiguity)Stemmed form (root)Dictionary/Thesaurus entries of neighboring wordsHigh co-occurrence words (plant, tree, origin,…)Other senses of word within discourseAlgorithms:Rule-based Learning (e.g. IG guided)Statistical Learning (i.e. Naïve Bayes)Unsupervised Learning (i.e. Nearest Neighbor)(Adapted from ChengXiang Zhai, CS 397cxz – Fall 2003)Prof. Pier Luca LanziDocument Outline
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ
  • ÿ

Download
Machine Learning and Data Mining: 19 Mining Text And Web Data

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share Machine Learning and Data Mining: 19 Mining Text And Web Data to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share Machine Learning and Data Mining: 19 Mining Text And Web Data as:

From:

To:

Share Machine Learning and Data Mining: 19 Mining Text And Web Data.

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share Machine Learning and Data Mining: 19 Mining Text And Web Data as:

Copy html code above and paste to your web page.

loading