Kotlin Dataframe: typesafe in-memory structured data processing for JVM
Kotlin Dataframe aims to reconcile Kotlin's static typing with the dynamic nature of data by utilizing both the full power of the Kotlin language and the opportunities provided by intermittent code execution in Jupyter notebooks and REPL.
- Hierarchical — represents hierarchical data structures, such as JSON or a tree of JVM objects.
- Functional — data processing pipeline is organized in a chain of
DataFrame
transformation operations. Every operation returns a new instance ofDataFrame
reusing underlying storage wherever it's possible. - Readable — data transformation operations are defined in DSL close to natural language.
- Practical — provides simple solutions for common problems and the ability to perform complex tasks.
- Minimalistic — simple, yet powerful data model of three column kinds.
- Interoperable — convertable with Kotlin data classes and collections.
- Generic — can store objects of any type, not only numbers or strings.
- Typesafe — on-the-fly generation of extension properties for type safe data access with Kotlin-style care for null safety.
- Polymorphic — type compatibility derives from column schema compatibility. You can define a function that requires a special subset of columns in dataframe but doesn't care about other columns.
Integrates with Kotlin kernel for Jupyter. Inspired by krangl, Kotlin Collections and pandas
Documentation
Explore documentation for details.
You could find the following articles there:
- Get started with Kotlin DataFrame
- Working with Data Schemas
- Full list of all supported operations
- Rendering to HTML
Setup
Optional Gradle plugin for enhanced type safety and schema generation https://kotlin.github.io/dataframe/schemasgradle.html
Check out the custom setup page if you don't need some of the formats as dependencies, for Groovy, and for configurations specific to Android projects.
Getting started
Getting started with data schema
Requires Gradle plugin to work
Plugin generates extension properties API for provided sample of data. Column names and their types become discoverable in completion.
Getting started in Jupyter Notebook / Kotlin Notebook
Install Kotlin kernel for Jupyter
Import stable dataframe
version into notebook:
When a cell with a variable declaration is executed, in the next cell DataFrame
provides extension properties based on its data
Data model
DataFrame
is a list of columns with equal sizes and distinct names.DataColumn
is a named list of values. Can be one of three kinds:ValueColumn
— contains dataColumnGroup
— contains columnsFrameColumn
— contains dataframes
Syntax example
Let us show you how data cleaning and aggregation pipelines could look like with DataFrame.
Create:
Clean:
Aggregate:
Check it out on Datalore to get a better visual impression of what happens and what the hierarchical DataFrame structure looks like.
Explore more examples here.
Kotlin, Kotlin Jupyter, OpenAPI, Arrow and JDK versions
This table shows the mapping between main library component versions and minimum supported Java versions.
Kotlin DataFrame Version | Minimum Java Version | Kotlin Version | Kotlin Jupyter Version | OpenAPI version | Apache Arrow version |
---|---|---|---|---|---|
0.10.0 | 8 | 1.8.20 | 0.11.0-358 | 3.0.0 | 11.0.0 |
0.10.1 | 8 | 1.8.20 | 0.11.0-358 | 3.0.0 | 11.0.0 |
0.11.0 | 8 | 1.8.20 | 0.11.0-358 | 3.0.0 | 11.0.0 |
0.11.1 | 8 | 1.8.20 | 0.11.0-358 | 3.0.0 | 11.0.0 |
0.12.0 | 8 | 1.9.0 | 0.11.0-358 | 3.0.0 | 11.0.0 |
0.12.1 | 8 | 1.9.0 | 0.11.0-358 | 3.0.0 | 11.0.0 |
0.13.1 | 8 | 1.9.22 | 0.12.0-139 | 3.0.0 | 15.0.0 |
Code of Conduct
This project and the corresponding community are governed by the JetBrains Open Source and Community Code of Conduct. Please make sure you read it.
License
Kotlin Dataframe is licensed under the Apache 2.0 License.