Friday, August 17, 2007

Thesis Update

Thesis Update: Fuzzy Clustering Results:

This week I performed a fuzzy clustering analysis using the “fanny” function from the ‘cluster’ package discussed previously in this blog upon the Compensation of Employee (Chain Volume Measure) and Gross Value Added data from Dataset 5204.0 from the Australian Bureau of Statistic in order to get a feeling on how to perform the final analysis upon the final data set (which was provided by Dr Poon). The results are as follows:

Fuzzy Clustering object of class 'fanny' :
m.ship.expon. 2
objective 86706.06
tolerance 1e-15
iterations 35
converged 1
maxit 500
n 17
Membership coefficients (in %, rounded):
[,1] [,2] [,3] [,4]
[1,] 87 2 6 4
[2,] 88 2 6 4
[3,] 15 35 22 28
[4,] 80 3 10 7
[5,] 10 4 70 16
[6,] 6 3 76 15
[7,] 4 4 14 78
[8,] 72 4 15 9
[9,] 15 5 63 17
[10,] 91 1 4 3
[11,] 7 5 51 37
[12,] 1 95 2 2
[13,] 6 4 67 23
[14,] 4 4 14 79
[15,] 8 15 20 57
[16,] 86 2 7 5
[17,] 62 5 21 12
Fuzzyness coefficients:
dunn_coeff normalized
0.5945847 0.4594462
Closest hard clustering:
[1] 1 1 2 1 3 3 4 1 3 1 3 2 3 4 4 1 1

Silhouette plot information:
cluster neighbor sil_width
10 1 3 0.82938820
1 1 3 0.81032816
2 1 3 0.80477324
16 1 3 0.80118019
4 1 3 0.76509472
8 1 3 0.69297873
17 1 3 0.57146445
3 2 4 0.06730237
12 2 4 -0.17685372
6 3 4 0.61400342
5 3 4 0.60246272
9 3 1 0.54122724
13 3 4 0.49619115
11 3 4 0.26723303
15 4 3 0.52706369
7 4 3 0.41649447
14 4 3 0.38930883
Average silhouette width per cluster:
[1] 0.75360110 -0.05477567 0.50422351 0.44428900
Average silhouette width of total data set:
[1] 0.5305671

136 dissimilarities, summarized :
Min. 1st Qu. Median Mean 3rd Qu. Max.
5182.8 21552.0 55122.0 62729.0 85846.0 177810.0
Metric : euclidean
Number of objects : 17

However, this data, upon discussion with Dr Poon, was deemed incorrect, as I had set the number of clusters for the fuzzy clustering. Therefore, this week, I will attempt to perform this test again without setting the cluster parameter. The fanny function, however, will not work without having the cluster being set, and thusly, I will attempt to use the e1071 package in R, as discussed last week, to perform this test. Additionally, if time permits, this week I will also be using the micEcon package’s switching regression function on the dataset provided by Dr Poon.

I have also started writing up my partial draft, and expect this to be completed by Friday of this week.

No comments: