提交 dfd204df 编写于 作者: rictjo's avatar rictjo

example9.py

上级 b19eeb78
......@@ -485,6 +485,13 @@ full cell -> mitochondrial intermembrane space
```
the definition for the mitochondrion is fully contained within the melanosome membrane definition and so testing that group should try and account for the mitochondrion. This can be done with the `HierarchicalEnrichment` routine exemplified above. We know that the melanosome membrane is associated with sight and that being diabetic is associated with mitochondrial dysfunction, but also that diabetic retinopathy affects diabetics and we also see that there is a knowledge based genetic connection relating these two spatially distinct regions of the cell.
# [Example 9](https://gist.githubusercontent.com/richardtjornhammar/e84056e0b10f8d550258a1e8944ee375/raw/45fb8322487ff3a384e7f56eb06ac1073aee4da1/example9.py): Impetuous [deterministic DBSCAN](https://github.com/richardtjornhammar/impetuous/blob/master/src/impetuous/clustering.py) (search for dbscan)
[DBSCAN](https://en.wikipedia.org/wiki/DBSCAN) is a clustering algorithm that can be seen as a way of rejecting points that are positioned in low dense regions of a point cloud. This introduces holes and may result in a larger segment, that would otherwise be connected via a non dense link to become disconnected and form two segments, or clusters, instead. The rejection criterion is simple. The central concern is to evaluate a distance matrix <img src="https://render.githubusercontent.com/render/math?math=A_{ij}"> with an applied cutoff <img src="https://render.githubusercontent.com/render/math?math=\epsilon"> this turns the distance into a true or false value depending if the pair distance between point i and j are within the distance cutoff. This new binary Neighbour matrix <img src="https://render.githubusercontent.com/render/math?math=N_{ij}"> tells you wether or not two points are neighbours. The DBSCAN criterion states that a point is not part of any cluster if it has fewer than `minPts` neighbors. Once you've calculated the distance matrix you can immediately evaluate the number of neighbors each point has, via <img src="https://render.githubusercontent.com/render/math?math=R_i=(\sum_{j} A_{ij}<\epsilon)-1 < minPts">. If the rejection vector R of a point is True then all the pairwise distances in the distance matrix of that point is set to value larger than epsilon. This ensures that distance matrix search will reject these as Neighbours of any other for the choose epsilon. By tracing out all points that are neighbors and assessing the [connectivity](https://github.com/richardtjornhammar/impetuous/blob/master/src/impetuous/clustering.py) you can find all the clusters.
In this [example](https://gist.githubusercontent.com/richardtjornhammar/e84056e0b10f8d550258a1e8944ee375/raw/45fb8322487ff3a384e7f56eb06ac1073aee4da1/example9.py) we do exactly this for two gaussian point clouds. The dbscan search is just a single line `dbscan ( data_frame = point_cloud_df , eps=0.45 , minPts=4 )`, while the last 27 lines are just there to plot the [results](https://bl.ocks.org/richardtjornhammar/raw/0cc0ff037e88c76a9d65387155674fd1/?raw=true) with [revisions](graph revsion dates : https://gist.github.com/richardtjornhammar/0cc0ff037e88c76a9d65387155674fd1/revisions)
# Notes
These examples were meant as illustrations of some of the codes implemented in the impetuous-gfa package. The impetuous visualisation codes requires [Bokeh](https://docs.bokeh.org/en/latest/index.html) and are still being migrated to work with the latest Bokeh versions.
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册