Turn a Git repo into a collection of interactive notebooks
Have a repository full of Jupyter notebooks that use Dask to perform scalable computations? With Pangeo-Binder, open those notebooks in an executable environment, launch a Dask-Kubernetes cluster, access datasets stored on the cloud, and make your code immediately reproducible by anyone, anywhere.
How it works
1
Enter your repository information
Provide in the above form a URL or a GitHub repository that contains Jupyter notebooks, as well as a branch, tag, or commit hash. Launch will build your Binder repository. If you specify a path to a notebook file, the notebook will be opened in your browser after building.
Provide in the above form a URL or a GitHub repository that contains Jupyter notebooks, as well as a branch, tag, or commit hash. Launch will build your Binder repository. If you specify a path to a notebook file, the notebook will be opened in your browser after building.
2
We build a Docker image of your repository
Binder will search for a dependency file, such as requirements.txt or environment.yml, in the repository's root directory (more details on more complex dependencies in documentation). The dependency files will be used to build a Docker image. If an image has already been built for the given repository, it will not be rebuilt. If a new commit has been made, the image will automatically be rebuilt.
Binder will search for a dependency file, such as requirements.txt or environment.yml, in the repository's root directory (more details on more complex dependencies in documentation). The dependency files will be used to build a Docker image. If an image has already been built for the given repository, it will not be rebuilt. If a new commit has been made, the image will automatically be rebuilt.
3
Interact with your notebooks in a live environment!
A JupyterHub server will host your repository's contents. We offer you a reusable link and badge to your live repository that you can easily share with others.
A JupyterHub server will host your repository's contents. We offer you a reusable link and badge to your live repository that you can easily share with others.
4
Scale your computations across an adaptive dask cluster
A Dask is a flexible parallel computing library for analytics. Dask is the key to the scalability of the Pangeo platform; its data structures are capable of representing extremely large datasets without actually loading them in memory, and its distributed schedulers permit supercomputers and cloud computing clusters to efficiently parallelize computations across many nodes. This binder deployment enables using dask-kubernetes to scale computations on a Kubernetes cluster running on Google Cloud Platform using Dask Kubernetes.
A Dask is a flexible parallel computing library for analytics. Dask is the key to the scalability of the Pangeo platform; its data structures are capable of representing extremely large datasets without actually loading them in memory, and its distributed schedulers permit supercomputers and cloud computing clusters to efficiently parallelize computations across many nodes. This binder deployment enables using dask-kubernetes to scale computations on a Kubernetes cluster running on Google Cloud Platform using Dask Kubernetes.