{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to Earth2Studio \n", "\n", "In this notebook, we will introduce the Earth2Studio Python package and run a example that will demonstrate how to run a simple inference workflow to generate a basic determinstic forecast using one of the built in models of Earth-2 Inference Studio.\n", "\n", "\n", "#### Contents of the Notebook\n", "\n", "- [Earth2Studio](#Earth2Studio)\n", "- [Simple Deterministic Inference](#Simple-Deterministic-Inference)\n", " - [Set Up](#Set-Up)\n", " - [Execute the Workflow](#Execute-the-Workflow)\n", " - [Post Processing](#Post-Processing)\n", "\n", "#### Learning Outcomes\n", "\n", "- Earth2Studio Key Features\n", "- How to instantiate a built in prognostic model\n", "- Creating a data source and IO object\n", "- Running a simple built in workflow\n", "- Post-processing results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Earth2Studio\n", "\n", "Earth2Studio is a Python package built to empower researchers, scientists, and enthusiasts in the fields of weather and climate science with the latest artificial intelligence models/capabilities. With an intuitive design and comprehensive feature set, this package serves as a robust toolkit for exploring this AI revolution in the weather and climate science domain.\n", "\n", "### Package Design\n", "\n", "The goal of this package is to enable the use to extrapolate and build beyond what is implemented in it. The design philosophy of Earth2Studio embodies a modular architecture where the inference workflow acts as a flexible adhesive, seamlessly binding together various specialized software components with well-defined interfaces. Each component within the package serves a distinct purpose in typical inference workflows.\n", "\n", "
\n", "
\n", " \n", "
Model architecture overview.
\n", "
\n", "
\n", "\n", "By viewing the inference workflow as a dynamic connector, Earth2Studio facilitates effortless integration of these components, allowing researchers to easily swap out or augment functionalities to suit their specific needs. We recognize that many users will have their own custom workflow needs, thus encourage users to use the provided features as a starting point to build their own.\n", "\n", "
\n", "
\n", " \n", "
\n", "
\n", "\n", "Significant importance is placed on the interface that enables the connection between the components and the workflow. These are simple python protocols that all variants of a particular component must share. This not only enables a consistent API but also the generalization of workflows.\n", "\n", "### Key Features\n", "\n", "While Earth2Studio contains a large collection of general utilities, functions and tooling the following six are considered the core. For more information on these features, see the dedicated documentation for each.\n", "\n", "- **Built-in Workflows**: Multiple built-in inference workflows to accelerate your development and research.\n", "- **Prognostic Models**: Support for the latest AI weather forecast models offered under a coherent interface.\n", "- **Diagnostic Models**: Diagnostic models for mapping to other quantities of interest.\n", "- **Datasources**: Datasources to connect on-prem and remote data stores to inference workflows.\n", "- **IO**: Simple, yet powerful IO utilities to export data for post-processing.\n", "- **Statistical Operators**: Statistical methods to fuse directly into your inference workflow for more complex uncertainty analysis.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simple Deterministic Inference\n", "\n", "
\n", "
\n", " \n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Set Up**\n", "All workflows inside Earth2Studio require constructed components to be handed to them. In this example, let's take a look at the most basic: `earth2studio.run.deterministic`.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us look at the built in Models, Datasource and IO Backends that are avaialable with Earth2Studio `0.2.0`release. \n", "\n", "\n", "#### Prognostic Model: \n", "\n", "Prognostic models are a class of models that perform time-integration. Thus are typically used to generate forecast predictions.\n", "\n", "The list of Prognostic Models available as of `0.2.0` are: \n", "- **models.px.DLWP** : Deep learning weather prediction (DLWP) prognostic model.\n", "- **models.px.FCN** : FourCastNet global prognostic model.\n", "- **models.px.FengWu** : FengWu (operational) weather model consists of single auto-regressive model with a time-step size of 6 hours.\n", "- **models.px.FuXi** : FuXi weather model consists of three auto-regressive U-net transfomer models with a time-step size of 6 hours.\n", "- **models.px.Pangu24** : Pangu Weather 24 hour model.\n", "- **models.px.Pangu6** : Pangu Weather 6 hour model.\n", "- **models.px.Pangu3** : Pangu Weather 3 hour model.\n", "- **models.px.Persistence** : Persistence model that generates a forecast by applying the identity operator on the initial condition and indexing the lead time by 6 hours.\n", "- **models.px.SFNO** : Spherical Fourier Operator Network global prognostic model.\n", "\n", "\n", "#### Data source : \n", "\n", "Data sources used for downloading, caching and reading different weather / climate data APIs into Xarray data arrays. Used for fetching initial conditions for inference and validation data for scoring.\n", "\n", "The list of Datasources available as of `0.2.0` are: \n", "- **data.ARCO** : Analysis-Ready, Cloud Optimized (ARCO) is a data store of ERA5 re-analysis data currated by Google.\n", "- **data.CDS** : The climate data source (CDS) serving ERA5 re-analysis data.\n", "- **data.GFS** : The global forecast service (GFS) initial state data source provided on an equirectangular grid.\n", "- **data.HRRR** : High-Resolution Rapid Refresh (HRRR) is a North-American weather forecast model with hourly data-assimilation developed by NOAA.\n", "- **data.IFS** : The integrated forecast system (IFS) initial state data source provided on an equirectangular grid.\n", "- **data.IMERG** : The Integrated Multi-satellitE Retrievals (IMERG) for GPM.\n", "- **data.Random(domain_coords)** : A randomly generated normally distributed data.\n", "- **data.WB2ERA5** : ERA5 reanalysis data with several derived variables on a 0.25 degree lat-lon grid from 1959 to 2023 (incl) to 6 hour intervals on 13 pressure levels.\n", "- **data.WB2ERA5_121x240** : ERA5 reanalysis data with several derived variables down sampled to a 1.5 degree lat-lon grid from 1959 to 2023 (incl) to 6 hour intervals on 13 pressure levels.\n", "- **data.WB2ERA5_32x64** : ERA5 reanalysis data with several derived variables down sampled to a 5.625 degree lat-lon grid from 1959 to 2023 (incl) to 6 hour intervals on 13 pressure levels.\n", "- **data.WB2Climatology** : Climatology provided by WeatherBench2,\n", "- **data.DataArrayFile** : A local xarray dataarray file data source.\n", "- **data.DataSetFile** : A local xarray dataset file data source.\n", "\n", "#### IO Backend: \n", "\n", "The IO Backends for used for saving the inference results for further post processing.\n", "\n", "The list of IO Backends available as of `0.2.0` are: \n", "- **io.KVBackend** : A key-value (dict) backend.\n", "- **io.NetCDF4Backend** : A backend that supports the NetCDF4 format.\n", "- **io.XarrayBackend** : An xarray backed IO object.\n", "- **io.ZarrBackend** : A backend that supports the zarr format.\n", "\n", "\n", "For this example, we will be using the following:\n", "\n", "- **Prognostic Model**: Use the built in FourCastNet Model :`earth2studio.models.px.FCN`.\n", "- **Datasource**: Pull data from the GFS data api :`earth2studio.data.GFS`.\n", "- **IO Backend**: Let's save the outputs into a Zarr store :`earth2studio.io.ZarrBackend`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "import os\n", "\n", "os.environ['EARTH2STUDIO_CACHE'] = os.getcwd() + \"/outputs/cache\"\n", "os.makedirs(\"outputs\", exist_ok=True)\n", "from dotenv import load_dotenv\n", "load_dotenv()\n", "\n", "from earth2studio.data import GFS\n", "from earth2studio.io import ZarrBackend\n", "from earth2studio.models.px import FCN\n", "\n", "# Prognostic Model - Load the default model package which downloads the check point from NGC\n", "package = FCN.load_default_package()\n", "model = FCN.load_model(package)\n", "\n", "# Data Source - Create the data source\n", "data = GFS()\n", "\n", "# IO Backend - Create the IO handler, store in memory\n", "io = ZarrBackend()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Execute the Workflow**\n", "\n", "With all components initialized, running the workflow is a single line of Python code. \n", "Workflow will return the provided IO object back to the user, which can be used to\n", "then post process. Let us look at the API for Determinstic inference\n", "\n", "```python \n", "\n", "def deterministic(\n", " time: list[str] | list[datetime] | list[np.datetime64],\n", " nsteps: int,\n", " prognostic: PrognosticModel,\n", " data: DataSource,\n", " io: IOBackend,\n", " output_coords: CoordSystem = OrderedDict({}),\n", " device: torch.device | None = None,\n", ") -> IOBackend:\n", " \"\"\"Built in deterministic workflow.\n", " This workflow creates a determinstic inference pipeline to produce a forecast\n", " prediction using a prognostic model.\n", "\n", " Parameters\n", " ----------\n", " time : list[str] | list[datetime] | list[np.datetime64]\n", " List of string, datetimes or np.datetime64\n", " nsteps : int\n", " Number of forecast steps\n", " prognostic : PrognosticModel\n", " Prognostic model\n", " data : DataSource\n", " Data source\n", " io : IOBackend\n", " IO object\n", " output_coords: CoordSystem, optional\n", " IO output coordinate system override, by default OrderedDict({})\n", " device : torch.device, optional\n", " Device to run inference on, by default None\n", "\n", " Returns\n", " -------\n", " IOBackend\n", " Output IO object\n", " \"\"\"\n", "```\n", "\n", "For the forecast we will predict for 20 forecast steps which is 5 days." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "import earth2studio.run as run\n", "\n", "nsteps = 20 # Each step has a lead time of 6 hours. \n", "io = run.deterministic([\"2024-01-01\"], nsteps, model, data, io)\n", "\n", "print(io.root.tree())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### **Post Processing**\n", "The last step is to post process our results. Cartopy is a great library for plotting\n", "fields on projections of a sphere. Here we will just plot the temperature at 2 meters\n", "(t2m) 1 day into the forecast.\n", "\n", "Notice that the Zarr IO function has additional APIs to interact with the stored data.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "import cartopy.crs as ccrs\n", "import matplotlib.pyplot as plt\n", "\n", "forecast = \"2024-01-01\"\n", "variable = \"t2m\"\n", "step = 4 # lead time = 4 x 6 = 24 hrs\n", "\n", "plt.close(\"all\")\n", "# Create a Robinson projection\n", "projection = ccrs.Robinson()\n", "\n", "# Create a figure and axes with the specified projection\n", "fig, ax = plt.subplots(subplot_kw={\"projection\": projection}, figsize=(10, 6))\n", "\n", "# Plot the field using pcolormesh\n", "im = ax.pcolormesh(\n", " io[\"lon\"][:],\n", " io[\"lat\"][:],\n", " io[variable][0, step],\n", " transform=ccrs.PlateCarree(),\n", " cmap=\"Spectral_r\",\n", ")\n", "\n", "# Set title\n", "ax.set_title(f\"{forecast} - Lead time: {6*step}hrs\")\n", "\n", "# Add coastlines and gridlines\n", "ax.coastlines()\n", "ax.gridlines()\n", "plt.savefig(\"outputs/01_t2m_prediction.jpg\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us now create a simple GIF that would go through all the steps using the below script.\n", "\n", "Kindly note, the below script would take approximately 10 minutes to show the output. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.animation as animation\n", "from IPython.display import HTML\n", "\n", "forecast = \"2024-01-01\"\n", "variable = \"t2m\"\n", "num_timesteps = 20 # Number of time steps to create GIF\n", "\n", "# Create a Robinson projection\n", "projection = ccrs.Robinson()\n", "\n", "# Create a figure and axes with the specified projection\n", "fig, ax = plt.subplots(subplot_kw={\"projection\": projection}, figsize=(10, 6))\n", "\n", "# Create a function to update the frame\n", "def update(frame):\n", " ax.clear() # Clear the axis for new plot\n", " im = ax.pcolormesh(\n", " io[\"lon\"][:],\n", " io[\"lat\"][:],\n", " io[variable][0, frame],\n", " transform=ccrs.PlateCarree(),\n", " cmap=\"Spectral_r\",\n", " )\n", " ax.set_title(f\"{forecast} - Lead time: {6 * frame } hrs\")\n", " ax.coastlines()\n", " ax.gridlines()\n", " return im,\n", "\n", "# Create an animation\n", "ani = animation.FuncAnimation(fig, update, frames=num_timesteps, blit=False)\n", "\n", "# Save as GIF & Display them - Kindly note, this cell takes around 10 minutes or more to display the output\n", "ani.save('outputs/t2m_prediction_animation.gif')\n", "print(\"Animation of 20 Timesteps\")\n", "HTML(ani.to_html5_video())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Important: Free up GPU Memory!\n", "\n", "Run the below cell to free up GPU memory after training the model before moving to the next notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "os._exit(00)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we had looked at the plot of t2m at the 4th step ( Each step is 6 hours in FCN Model ). In the Next notebook, let us extend this by adding a Diagnostic Model with FCN." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "--- \n", "\n", "Don't forget to check out additional [Open Hackathons Resources](https://www.openhackathons.org/s/technical-resources) and join our [OpenACC and Hackathons Slack Channel](https://www.openacc.org/community#slack) to share your experience and get more help from the community.\n", "\n", "---\n", "\n", "# Licensing\n", "\n", "Copyright © 2023 OpenACC-Standard.org. This material is released by OpenACC-Standard.org, in collaboration with NVIDIA Corporation, under the Creative Commons Attribution 4.0 International (CC BY 4.0). These materials may include references to hardware and software developed by other entities; all applicable licensing and copyrights apply.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.2" } }, "nbformat": 4, "nbformat_minor": 4 }