未验证 提交 61561aa7 编写于 作者: P PyCaret 提交者: GitHub

Add files via upload

上级 fe0cd4c8
## Welcome to PyCaret!
PyCaret is a free software and an open source low-code machine learning library for supervised and unsupervised machine learning techniques in Python programming language. Its primary objective is to reduce the cycle time from hypothesis to insights and make data scientists more productive in their experiments. It does so by providing a high-level API which is sophisticated yet easy to use and consistent across all modules. PyCaret enables data scientists and analysts to perform iterative end-to-end data science experiments in very efficient way allowing them to reach the conclusions faster. Through the use of its high-level low-code API, the amount of time spent in coding experiments reduce drastically, allowing business to restructure their machine learning workflows and re-evaluate the value chain of data science projects. PyCaret is essentially a python wrapper around several machine learning frameworks and libraries such as scikit-learn, XGBoost, Microsoft LightGBM, spaCy to name a few.
___
## Current Release
The current release is beta 0.0.37 (as of 08/02/2020). A full public release is expected by end of Feb 2020.
___
## Who should use PyCaret?
PyCaret is free and open source library which is easy to install and can be setup either locally or on any cloud service within minutes. As such there is no limitation of use however, in our opinion following are the ideal target audience: <br />
......@@ -20,8 +16,6 @@ PyCaret is free and open source library which is easy to install and can be setu
- Small to mid-size companies looking to implement data science projects without committing significant resources.
- Students, academicians, researchers and data science professionals seeking to combine simplicity of R language with power of Python.
___
## Installation
#### Dependencies
......@@ -33,12 +27,10 @@ The easiest way to install pycaret is using pip.
```python
pip install pycaret
```
___
## PyCaret is really low code
PyCaret is not only simple and easy to use but it is also easy to maintain. It allows data scientist to perform end-to-end experiment and enhances its ability to perform simple to complex tasks without the need to write and maintain extra lines of codes. By removing the hindrance of coding, we allow data scientists to be more creative and focused on business problems.
___
## PyCaret's sophisticated Pipeline
PyCaret is simple and easy to use but its functionalities are beyond basic. The architect of PyCaret is deployment ready which means as you perform the experiment, all the steps are automatically saved in a pipeline which can be deployed into production with ease.
......@@ -64,34 +56,24 @@ When a model is constructed using PyCaret, it becomes part of pipeline. A pipeli
PyCaret automatically orchestrates all of the dependencies between pipeline steps. Once a pipeline is constructed, it can be transferred to another environment to run on a different hardware to perform tasks at scale. When an experiment is initialized in PyCaret, a pseudo-random seed is generated and distributed to all processes and sub-processes in PyCaret. This allows for reproducibility in future. By organizing an experiment using pipeline, PyCaret supports the computer science imperative of modularization i.e. each component should do only one thing. Modularity is vital in successfully deploying data science projects. Not only this helps you to stay organized experiment but also without pipeline, it is difficult to manage the entire machine learning experiment and requires a lot of technical expertise and resources to manage deployment.
___
## PyCaret is seemlessly integrated
PyCaret and its Machine Learning capabilities can be seamlessly integrated within any other environment that supports python integration such as Microsoft Power BI, Tableau, Alteryx, Informatica and KNIME to name a few. This gives immense power to users of these platforms who can now integrate PyCaret in their existing workflows to add layer of Machine Learning in their BI applications absolutely free and with ease.
PyCaret also has rich magic functions for unsupervised learning. Magic functions are essentially shortcuts (one-word code) that can be executed within the existing ETL pipeline. This will allow business analysts and domain experts to integrate machine learning and implement sophisticated techniques be it Density-Based Spatial Clustering for segment analysis or Isolation Forest for outlier detection. This will enable analysts to leverage the power of advanced analytics and machine learning in their comfort zone without needing to write hundreds of lines of code.
___
## Reproducibility using PyCaret
Reproducing the entire experiment with the same results is often a challenge. This is due to nature of randomization involved in machine learning. Many preprocessing steps and algorithms are dependent on randomized component which is controlled through pseudo-random number generator. Many libraries including sci-kit learn does not have its own random generator. A global random state can be set in an environment, However, it is prone to modification by other code during execution. Thus, the only way to achieve replicability is the pass random state instance in every function. For a typical machine learning experiment with iteration, this could be well over fifty (50) places to define. PyCaret's architect has solved this problem by back tracing all the functions and distributing the unified random state instance globally. When environment is initialized in PyCaret it generates a pseudo-random number which is also displayed in the grid printed after setup is complete. You can also pass the same number in setup as a parameter to reproduce the exact same results in any other environment as well.
___
## PyCaret Deployment abilities
PyCaret also supports model deployment on cloud for consuming in production. It does so in a very simplistic way. This capability in PyCaret reduces the reliance of data science practitioners on data engineers to some extent, allowing them to focus on high scale systems. As of the first public release, only AWS S3 containers are supported for deployment. This feature is in preview for now and only support’s batch predictions. We are working to improve the functionality and include support for other cloud providers such as Microsoft Azure and Google.
___
## PyCaret Runtime Visibility
Often when performing experiment, one of the challenge or frustration is anticipating how long a particular model will take in training. Waiting sometimes may result in hours or days. Having visibility of remaining time of experiment as your code progresses through is a big feature PyCaret has to offer (only for Notebook users – since it uses HTML). Having run time visibility of the code may help data scientists / citizen data scientists to anticipate the total run time based on initial estimation shown by PyCaret. This may lead to decision making process to run or not run certain models which otherwise would have not been known until training is completed.
___
## License
Copyright 2019-2020 Moez Ali <moez.ali@queensu.ca>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册