intake-dataframe-catalog

intake-dataframe-catalog#

A simple intake plugin for a searchable table of intake sources and associated metadata.


Documentation

Documentation Status

Package

PyPI package Conda package

CI/CD

Package CI test status Package CD status

Development

Code test coverage Black code formatter

License

Apache-2.0 License

Overview#

intake-dataframe-catalog is a simple intake plugin for a searchable table of intake sources. The table is represented in memory as a pandas DataFrame and can be serialized and shared as a CSV file. Each row in the dataframe catalog corresponds to an intake source and the columns contain metadata associated with each source that a user may want to peruse and/or search. The original use-case for intake-dataframe-catalog was to provide a user-friendly catalog of a large number intake-esm datastores. intake-dataframe-catalog enables users to peruse and search on core metadata from each intake-esm datastore to find the datastores that are most relevant to their work (e.g. “which datastores contain model X and variable Y?”). Once a users has found the datastores(s) that interest them, they can load those datastores and access the data they reference.

Why?#

Intake already provides the ability to nest sources in a catalog and search across them. However, data discoverability is limited in the case of very large numbers of nested sources, and the search functionality does not readily provide the ability to execute complex searches on nested source metadata. intake-dataframe-catalog aims to provide a very simple catalog of intake sources that emphasises source search and discoverability.

Get in touch#

If you encounter any issues with intake-dataframe-catalog or you’d like to request any new features, please open an issue here.