README.md 5.4 KB
Newer Older
rictjo's avatar
rictjo 已提交
1
# A Statistical Learning library for Humans
rictjo's avatar
rictjo 已提交
2
Decomposes a set of expressions into a group expression. The toolkit currently offers enrichment analysis, hierarchical enrichment analysis, PLS regression, Shape alignment or clustering as well as  rudimentary factor analysis.
rictjo's avatar
init  
rictjo 已提交
3

rictjo's avatar
rictjo 已提交
4
The expression regulation can be studied via a statistical test that relates it to the observables in the journal file. The final p values are then FDR corrected and the resulting adjusted p values are produced.
rictjo's avatar
init  
rictjo 已提交
5

rictjo's avatar
rictjo 已提交
6
Visit the active code via :
rictjo's avatar
package  
rictjo 已提交
7 8
https://github.com/richardtjornhammar/impetuous

rictjo's avatar
rictjo 已提交
9
Visit the published code : 
rictjo's avatar
rictjo 已提交
10
https://doi.org/10.5281/zenodo.2594690
rictjo's avatar
package  
rictjo 已提交
11

rictjo's avatar
rictjo 已提交
12
Cite using :
rictjo's avatar
rictjo 已提交
13
DOI: 10.5281/zenodo.2594690
rictjo's avatar
rictjo 已提交
14

rictjo's avatar
desc  
rictjo 已提交
15
# Pip installation with :
rictjo's avatar
rictjo 已提交
16
```
rictjo's avatar
rictjo 已提交
17
pip install impetuous-gfa
rictjo's avatar
rictjo 已提交
18
```
rictjo's avatar
desc  
rictjo 已提交
19 20 21 22 23 24 25 26 27

# Version controlled installation of the Impetuous library

The Impetuous library

In order to run these code snippets we recommend that you download the nix package manager. Nix package manager links from Oktober 2020:

https://nixos.org/download.html

rictjo's avatar
rictjo 已提交
28
```
rictjo's avatar
desc  
rictjo 已提交
29
$ curl -L https://nixos.org/nix/install | sh
rictjo's avatar
rictjo 已提交
30
```
rictjo's avatar
desc  
rictjo 已提交
31 32 33

If you cannot install it using your Wintendo then please consider installing Windows Subsystem for Linux first:

rictjo's avatar
rictjo 已提交
34
```
rictjo's avatar
desc  
rictjo 已提交
35
https://docs.microsoft.com/en-us/windows/wsl/install-win10
rictjo's avatar
rictjo 已提交
36
```
rictjo's avatar
desc  
rictjo 已提交
37 38 39 40 41 42 43

In order to run the code in this notebook you must enter a sensible working environment. Don't worry! We have created one for you. It's version controlled against python3.7 and you can get the file here:

https://github.com/richardtjornhammar/rixcfgs/blob/master/code/environments/impetuous-shell.nix

Since you have installed Nix as well as WSL, or use a Linux (NixOS) or bsd like system, you should be able to execute the following command in a termnial:

rictjo's avatar
rictjo 已提交
44
```
rictjo's avatar
desc  
rictjo 已提交
45
$ nix-shell impetuous-shell.nix
rictjo's avatar
rictjo 已提交
46
```
rictjo's avatar
desc  
rictjo 已提交
47 48 49

Now you should be able to start your jupyter notebook locally:

rictjo's avatar
rictjo 已提交
50
```
rictjo's avatar
desc  
rictjo 已提交
51
$ jupyter-notebook impetuous_finance.ipynb
rictjo's avatar
rictjo 已提交
52
```
rictjo's avatar
desc  
rictjo 已提交
53 54

and that's it.
rictjo's avatar
rictjo 已提交
55

rictjo's avatar
rictjo 已提交
56
# Usage example 1 : elaborate informatics
rictjo's avatar
rictjo 已提交
57 58 59 60

code: https://gitlab.com/stochasticdynamics/eplsmta-experiments
docs: https://arxiv.org/pdf/2001.06544.pdf

rictjo's avatar
rictjo 已提交
61
# Usage example 2 : simple regression code
rictjo's avatar
rictjo 已提交
62 63 64 65 66 67 68 69 70

Now while in a good environment: In your Jupyter notebook or just in a dedicated file.py you can write the following:

```
import pandas as pd
import numpy as np

import impetuous.quantification as impq

rictjo's avatar
saiga++  
rictjo 已提交
71 72
analyte_df = pd.read_csv( 'analytes.csv' , '\t' , index_col=0 )
journal_df = pd.read_csv( 'journal.csv'  , '\t' , index_col=0 )
rictjo's avatar
rictjo 已提交
73

rictjo's avatar
saiga++  
rictjo 已提交
74 75 76 77
formula = 'S ~ C(industry) : C(block) + C(industry) + C(block)'

res_dfs 	= impq.run_rpls_regression ( analyte_df , journal_df , formula , owner_by = 'angle' )
results_lookup	= impq.assign_quality_measures( journal_df , res_dfs , formula )
rictjo's avatar
rictjo 已提交
78

rictjo's avatar
rictjo 已提交
79
print ( results_lookup )
rictjo's avatar
rictjo 已提交
80 81 82
print ( res_dfs )
```

rictjo's avatar
rictjo 已提交
83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
# Usage example 3 : Novel NLP sequence alignment

Finding a word in a text is a simple and trivial problem in computer science. However matching a sequence of characters to a larger text segment is not. In this example you will be shown how to employ the impetuous text fitting procedure. The strength of the fit is conveyed via the returned score, higher being a stronger match between the two texts. This becomes costly for large texts and we thus break the text into segments and words. If there is a strong word to word match then the entire segment score is calculated. The off and main diagonal power terms refer to how to evaluate an string shift. Fortinbras and Faortinbraaks are probably the same word eventhough the latter has two character shifts in it. In this example both "requests" and "BeautifulSoup" are employed to parse internet text.

```
import numpy as np
import pandas as pd

import impetuous.fit as impf    # THE IMPETUOUS FIT MODULE
                                # CONTAINS SCORE ALIGNMENT ROUTINE

import requests                 # FOR MAKING URL REQUESTS
from bs4 import BeautifulSoup   # FOR PARSING URL REQUEST CONTENT

if __name__ == '__main__' :

    print ( 'DOING TEXT SCORING VIA MY SEQUENCE ALIGNMENT ALGORITHM' )
    url_       = 'http://shakespeare.mit.edu/hamlet/full.html'

    response   = requests.get( url_ )
    bs_content = BeautifulSoup ( response.content , features="html.parser")

    name = 'fortinbras'
    score_co = 500
    S , S2 , N = 0 , 0 , 0
    for btext in bs_content.find_all('blockquote'):

        theTextSection = btext.get_text()
        theText        = theTextSection.split('\n')

        for segment in theText:
            pieces = segment.split(' ')
            if len(pieces)>1 :
                for piece in pieces :
                    if len(piece)>1 :
                        score = impf.score_alignment( [ name , piece ],
                                    main_diagonal_power = 3.5, shift_allowance=2,
                                    off_diagonal_power = [1.5,0.5] )
                        S    += score
                        S2   += score*score
                        N    += 1
                        if score > score_co :
                            print ( "" )
                            print ( score,name,piece )
                            print ( theTextSection )
                            print ( impf.score_alignment( [ name , theTextSection ],
                                        main_diagonal_power = 3.5, shift_allowance=2,
                                        off_diagonal_power = [1.5,0.5] ) )
                            print ( "" )

    print ( S/N )
    print ( S2/N-S*S/N/N )
```

rictjo's avatar
rictjo 已提交
137 138 139
# Manually updated code backups for this library :

GitLab:	https://gitlab.com/richardtjornhammar/impetuous
rictjo's avatar
edit  
rictjo 已提交
140

rictjo's avatar
rictjo 已提交
141 142
CSDN:	https://codechina.csdn.net/m0_52121311/impetuous