Saturday, February 28, 2015

GDX2SQLite –small option

In the tool GDX2SQLITE we export a GAMS GDX file to a SQLite database file.

The –small option mimics how GAMS stores data internally. All strings (called UELs or Unique Elements) are stored in a pool and all sets, parameters, variables and equations use index numbers for that pool.

To find out how such a pool looks like, you can do

alias (*,pool);
display pool;

The same storage mechanism is used in GDX files: the strings are stored separately.

When we convert a GDX file to a SQLite database there is an option to use the same storage structure. With SQL we can hide the details behind a VIEW: a “virtual” table that is computed on the fly from an underlying SQL SELECT statement.

Here is the mechanism:

image

In the picture above, only the first 6 rows of each table are shown. In reality the tables have a significant number of rows.

In practical models and data sets the number of UELS (ie the size of the string pool) is small (often surprisingly small) compared to the total number of rows in the data tables. The advantage of this storage scheme is that the database files are smaller.

Indeed here we have some statistics:

image

Artificial data often has a larger number of UELs than real world data, as is illustrated here (data.gdx is a synthetic data set while WaterCropData is from a real model).

Notes:

  1. A single UEL pool is used for multiple data tables.
  2. This somewhat related to older LP solvers which had a super-sparsity storage scheme. Not only were the zero elements not stored but also all values were stored in a pool and the coefficient matrix was pointing into this pool.  I don’t think this scheme is used anymore. Most recent reference I could find is http://www.sciencedirect.com/science/article/pii/0360835290900068. The original reference is:
        KALAN, J., "Aspects of Large-Scale In-Core Linear Programming," ACM Proceedings, 1971
    To be complete: Sparsity indicates we don’t store zero elements. Usually we don’t store zero elements in GDX files (as GAMS does not store and export them). This also means zero elements are not stored in the resulting database file either. E.g. in table Data, only combinations (i,v,value) exist where value ≠ 0.
  3. Version 0.5 of GDX2SQLITE adds better column names to the CREATE VIEW statements to make the results more compatible with the default storage scheme.
  4. The –small option is clearly beneficial on database size. What are the impacts on performance of creating the database and on querying the database? Creating the database is probably close to linear in the total size, so I expect some savings here. I have no good answer about querying speed. Note: the column uel in table UELS$ is a primary key.

No comments:

Post a Comment