@@ -730,7 +730,7 @@ The "Link" codes are more efficient at creating a link hierarchy of the data but
The "Link" method is thereby not useful for the deterministic treatment of a particle system where all the true connections in it are important, such as in a water bulk system when you want all your quantum-mechanical waters to be treated at the same level of theory based on their connectivity. This is indeed why my connectivity algorithm was invented by me in 2009. If you are only doing black box statistics then this distinction is not important and computational efficiency is probably what you care about. You can construct hierarchies from both algorithm types but the connection algorithm will always produce a unique and well-determined structure while the link algorithms will be unique but structurally dependent on how ties are resolved and which heuristic is employed for construction. The connection hierarchy is exact and deterministic, but slow to construct, while the link hierarchies are heuristic dependent and non-deterministic, but fast to construct. We will study this more in the following code example as well as the case when they are equivalent.
## Link hierarhy construction 14.1
## 14.1 Link hierarchy construction
The following code produces two distance matrices. One has distance ties and the other one does not. The second matrix is well known and the correct minimal linkage hierarchy is well known. Lets see compare the results between scipy and our method.
```
import numpy as np
...
...
@@ -809,7 +809,66 @@ We study the results below
```
We see that the only difference for these two examples are how the unclustered indices are treated. In our method they are set to the identity distance value of zero while scipy attributes them the lowest non diagonal value in the distance matrix.
## Connectivity construction
## 14.2 Connectivity construction
Now we employ the `connectivity` algorithm for construction. In the below code segment the first loop calls the function directly and the second calls the `impetuous.hierarchy_matrix` function
and we see that the system has 5 unique levels. The hierarchy matrix increase in distance as you traverse down. The first row is level `0` with distance `0` and all items are assigned to each own cluster. The third row, level `2`, contains three clusters at distance `3` and the three clusters are `0.2.4` as well as `1` and `3`. We see that they become joined at level `3` corresponding to distance `5`.
The final complete clustering results can be obtained in this alternative way for the `connectivity` hierarchy