Plotting Large Datasets

Investigate:
open_db_file _show_table_context_menu

Recommendation

Disentangling front-end and back-end

  • Easier development, testing, debugging
  • Easier data wrangling with a DB interface
  • Better performance when retrieving data from DB

Data Types

Benchmarks

  • timeit, best of 50 runs
  • Data on RAMdisk to counter caching variability

Code Investigation:
Expensive Operations

  • type checking
  • nested dicts
  • append to list

Data Wrangling

(spinedb_api: by ID)

Recommendation

Typing and Checking during Data Creation

  • Separation of responsibilities
    • Maintainable code
  • Less computations at run-time
  • Clearer data structure

Data Structure

Dict Nesting

Queryable JSON BLOBs

{
  "data": {
    "2000-01-01T00:00:00.0": 90.0,
    "2000-01-01T01:00:00.0": 91.0,
    "2000-01-01T02:00:00.0": 93.0
  },
  "index": {
    "ignore_year": false,
    "repeat": false
  }
}
              

Queryable JSON BLOBs

{
  "data": [
      {"t": "2000-01-01T00:00:00.0", "v": 90.0},
      {"t": "2000-01-01T01:00:00.0", "v": 91.0},
      {"t": "2000-01-01T02:00:00.0", "v": 93.0}
  ],
  "index": {
    "ignore_year": false,
    "repeat": false
  }
}
              

Q. JSON BLOBs: Performance

N.B.: adbc now compatible with our data format! 🥳

Discussing Parquet

Recommendation

Using a Queryable Data Structure

  • Speed!
  • Use DB for searching, filtering, etc.
  • Separation of responsibilities (deja vu)
    • Maintainable code

Questions?