Personal tools

Databases

A short presentation on database use in scientific computing.

Database

A system used to store and retrieve large amounts of data. For us they are mainly useful for storing simulation and computation output and experimental data.

Database Models

Databases can be structured with several different types of models.
Flat Model (e.g. table)
A table of columns and rows where each column is a particular data type and each row is related to one another. Clearest example is a spreadsheet.
Heirarchical
A tree-like structure with a parent and children model. Useful for describing real world things like a table of contents or other nested relationships. An example is XML (Extensible Markup Language) and it's relative HTML (HyperText Markup Language). JSON is another one that is based off of the JavaScript object model.
Network
Organized into two constructs, records and sets. Records have fields which can be heirarchical and the sets define one-to-many relationships between the records. Similar to the heirarchical but solves the problem of redundant data and fast lookup with pointers. And example is Objectivity/DB
Relational
A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The fundamental constructs are relations (the tables), attributes (named columns of the relations) and the doman (set of values the attributes are allowed to have). Any value occuring in two different tables implies a relationship. A key is typically specified from one of the attritbutes, such as a SSN. This can be a common relation between tables. An example is MySQL.
Object Oriented
Based on the object models used in modern programming and helps bridge the connection between application programming and database management. An object can be something in the real world and it has various attributes (e.g. an apple has a name, size, color and average weight). And instances of the object can have particular values of the attributes (i.e name="Granny Smith", size="Medium", color="gree", tc). An example is Zope.

PyTables

A python package for managing heirarchical datasets and designed to efficiently and easily manage extremely large amounts of data.

Features

  • Builtin support for NumPy array management
  • Hiearchical data model based on HDF5 (easy connection to matlab mat files)
  • Table creation and quick lookup based on criteria
  • Table cells can be multidimensional or even nested
  • Various array data containers besides tables
  • System and user defined meta data
  • Can easily work with tables and/or arrays that don't fit into memory (2**63)
  • Built in data compression

Example

Document Actions