Friday, September 10, 2010

Machine Learning with WEKA

Bernhard Pfahringer (based on material by Eibe Frank, Mark Hall, and Peter Reutemann)

Department of Computer Science University of Waikato, New Zealand


WEKA : A Machine Learning Toolkit


The Explorer
- Classification and Regression
- Clustering
- Association Rules
- Attribute Selection
- Data Visualization


The Experimenter
The Knowledge Flow GUI
Other Utilities
Conclusions


WEKA: the software






nMachine learning/data mining software written in Java (distributed under the GNU Public License)
nUsed for research, education, and applications
nComplements “Data Mining” by Witten & Frank
nMain features:
uComprehensive set of data pre-processing tools, learning algorithms and evaluation methods
uGraphical user interfaces (incl. data visualization)
uEnvironment for comparing learning algorithms
WEKA: versions





nThere are several versions of WEKA:
uWEKA 3.4: “book version” compatible with description in data mining book
uWEKA 3.5.5: “development version” with lots of improvements
nThis talk is based on a nightly snapshot of WEKA 3.5.5 (12-Feb-2007)


WEKA only deals with “flat” files

@relation heart-disease-simplified

@attribute age numeric (numeric attribute)
@attribute sex { female, male} (nominal attribute)
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present(Flat file in ARFF format)

@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...




java weka.gui.GUIChooser







d

No comments:

Post a Comment

Followers