الخلاصة
A decision tree is an important classification technique in data mining
classification. Decision trees have proved to be valuable tools for the classification,
description, and generalization of data. Work on building decision trees for data sets
exists in multiple disciplines such as signal processing, pattern recognition, decision
theory, statistics, machine learning and artificial neural networks. This research
deals with the problem of finding the parameter settings of decision tree algorithm in
order to build accurate, small trees, and to reduce execution time for a given domain.
The proposed approach (mC4.5) is a supervised learning model based on C4.5
algorithm to construct a decision tree. The modification on C4.5 algorithm includes
two phases: the first phase is discretization all continuous attributes instead of
dealing with numerical values. The second phase is using the average gain measure
instead of gain ratio measure, to choose the best attribute. It has been experimented
on three data sets. All those data files are picked up from the popular (UCI)
University of California at Irvine data repository. The results obtained from
experiments show that (mC4.5) is better than C4.5 in decreasing the total number of
nodes without affecting the accuracy; at the same time increasing the accuracy ratio. |