Classification is a commonplace problem nowadays. I have been lately working on classification issues for job reasons. I used Matlab, python and some other lower level languages for classification but resulted in tedious and kind of hard approaches.
R incorporates a special library to classify by using neural networks (NN), called ‘neuralnet’, which can be installed directly using: install.packages(‘neuralnet’).
In order to give an example along this post, we need a data set. Iris data set will be enough, it is perhaps the best known database to be found in the pattern recognition literature. We can download it from HERE.
The data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features.
Once neuralnet library has been instaled and data set has been downloaded we are ready to go.
The first step is to import the library:
After that, we need to load the data file to R:
irisData = read.table("iris.data", sep=",")
In order to make this process easier we can change the column names, with the next line:
colnames(irisData) = c("Sep.Len", "Sep.Wid", "Pet.Len", "Pet.Wid", "Class")
and we can now use our neural network. But we need some parameters first, like the number of hidden layers and nodes in each one, the training set, and class labels, we can do it as follows.
#2 hidden layers with 3 and 4 nodes respectively hidd = c(3, 4) #The training set is a sample of size 70, from the 150 in the data set train = irisData[sample(1:150, 70),] #Classes labeling train$setosa = c(train$Class == "Iris-setosa") train$versicolor = c(train$Class == "Iris-versicolor") train$virginica = c(train$Class == "Iris-virginica")
And now we can train our NN with the next line:
inet = neuralnet(setosa+versicolor+virginica~Sep.Len+Sep.Wid+Pet.Len+Pet.Wid,train, hidden=hidd, lifesign="full")
After it is trained, we can validate it.
predict = compute(inet, irisData[1:4])
Note the compute function, which receives the NN and the set to validate. In this case, our validation set is the whole data set, from column 1 to 4.
Now to see the results, we can just check the predict variable, in predict$net.result, and those are the predicted results for each sample in the data set.