Capt. Horatio T.P. Webb
MIS 4372 Database Alternatives: Models, Scaling, and Shardes
Parks -- Fall 2014

Version 1 -- Last Updated 9:00 PM 4/19/2013

  1. A SHORT HISTORICAL PERSPECTIVE

    Data Processing began with:

    Until the 1960s, all data processing was predicated on "sequential data" stored media such as punched cards and tapes (IBM referred to such methods as QSAM -- seQuential Access Method). These devices were called "unit record" equipment since they processed punched cards. The concept of a "record" thus became associated with a "card" as a unit of data (multiple characters puched sequentially on a card). Later in the 1950s, punched cards could also be transfered to magnetic tape (see 7 track tape)-- but processing was still sequential.

    IBM owned the the majority of the computer market until the late 1990s (see this IBM timeline). Their hardware solutions for data processing enjoyed an unprecedented popularity because of IBM's constant innovation, effective marketing and excellent technical support. Major releases of IBM mainframe hardware were:

    See the IBM competitors -- the BUNCH (IBM and the seven dwarfs)

    Contrary to what most people (people under 60 years of age) say about computers and the Internet: OLTP (on-line transaction processing) is nothing new; Al Gore didn't invent the internet; and there is not much really, really new. It was been done quite well for decades and decades. There is nothing new under the sun.

    The steps beyond reading "sequential" data files from cards and tapes are:

    1. ISAM -- Indexed Sequential Access Method

      Generally, the concept of non-sequential access file is predicated on a file that contains:

      • multiple "records" (rows) each containing multiple fields of data -- (like rows in columnar table data);
      • each record must have a unique identifier (column) called an "index" or "key"; and
      • associated with each "key" is the physical location of the data (the "storage address" of the "record")

      With this data file, an "index file" can be constructed containing a pair of data for each record:

      • a "key" -- i.e., a column name with unique data values in each row
      • a "disk location" for each record

      like this:

      Disk Storage Location (Physical Address)
       Disks contained multiple circular platters (each platter has top and bottom surfaces).
      • Disks may contain multiple circular platters with data on both the top and bottom platter surface.
        Each surface has a read/write head that puts or gets data from the surface.
      • Each surface is divided into concentric circles called "tracks".
        Vertically adjacent tracks (directly above and below one another on different surfaces) are called "cylinders".
      • Each track is divided into angular "sectors".

        Thus, the objective is to retrieve the "sector" were the data resides

      The disk location for data is specified by three data values:

      • which cylinder? (i.e., the R/W heads move in or out to be position directly over the correct cylinder -- "seek time")
      • which surface is the track? (i.e., this specifies which R/W head)
      • which sector? (i.e., which angular sub-division of the track. This "rotational latency" (i.e., how long for the drive to spin around for the desired sector to be under the R/W heads. This is the average the time for 1/2 a disk rotation. If the disk is spinning at 7,200 rotations per minute (RPM), then:
        rotations per second = 120,
        rotations per millisecond= 0.120,
        number of half-rotations per millisecond is 0.24 or 240 half rotations per second
        so, the rotational latency is:
        4.17 milliseconds for 7,200 RPM

      Data transfer time depends on how long it takes for the data to pass under the read/write heads. If we have 100 sectors along one track and the disc rotation speed is 120 rotations per second, then to transfer 1 sector would be:

      transfer time = time for one rotation / number of sectors
      = 8.3 milliseconds / 100
      = .083 milliseconds
      The "typical total time" to access data is the sum of:

             seek time       +   rotational latency + data transfer time 
                                                      
          4 milliseconds     +     4 milliseconds   +       .083
      
                             →      8.083 milliseconds  total time/sector
      

      This is ONLY about 125 sectors per second -- astoundingly slow. This is generally the reason large data applications can appear to be incredibly slow when data volume gets large (see the solution to this problem below)

      Additional information required is:

      • Volume Number -- which disk drive -- by name or number;
      • sector organization

        the entire sector is transferred to the input/output buffer (located in memory).

        Once in the I/O buffer, the specific breakdown of the sector data may depend on "blocks". "Blocks" are a group of contiguous of "records" (i.e. table rows) and each block in a sector is numbered and each record in a block is also numbered. So two pieces of data are needed:

        1. Block Number -- the block may be within a sector or span multiple sectors
        2. Record Number (position in the block) -- This determines which record in the block

      Below is a sample INDEX FILE that would contain the data for each record's disk location (drive, cylinder, surface (or head), and sector). It is shown with:

      • platters: 8
      • surfaces: 16 (1 surface on top of the platter AND one surface on the bottom of the platter)
      • tracks: 12 per surface
      • sectors: 12 per tracks (30 degrees each)
      • blocks: 3 per sector
      • records 3 per block

      Assume we are trying to find the record with KEY=98. Searching the index file below, we find the KEY=98 value and retrieve its disk address:

      
                   DISK ADDRESS                   BUFFER INFO
      [VOLUME=DF3 CYLINDER=6 HEAD=1 SECTOR=3] [BLOCK=2 RECORD=2]
      

      [ All records in  Volume=DF3 CYLINDER=6 SECTOR=3  is shown are blue.]

      INDEX FILE
       KEY 
       Volume - Cylinder - R/W Head - Sector - Block - Record
       Number    Number     Number    Number   Number  Number
      6
         DF3       6          1         3        2       3
      8
         DF3       6          1         4        1       2
      9
         DF3       6          1         3        3       1
      11
         DF3       6          1         4        2       3
      13
         DF3       6          1         3        1       2
      18
         DF3       6          1         4        3       3
      23
         DF3       6          1         3        3       3
      24
         DF3       6          1         3        2       1
      44
         DF3       6          1         4        1       1
      53
         DF3       6          1         4        3       1
      67
         DF3       6          1         3        1       1
      71
         DF3       6          1         4        2       2
      78
         DF3       6          1         4        3       2
      82
         DF3       6          1         4        1       3
      98
         DF3       6          1         3        2       2 
      109
         DF3       6          1         4        2       1
      139
         DF3       6          1         3        1       3
      214
         DF3       6          1         5        1       1
      231
         DF3       6          1         5        1       3
      256
         DF3       6          1         3        3       2
      263
         DF3       6          1         5        1       2

      Your browser does not support canvas

      This index file had to constructed as each "record" was written to the disk drive. For a new "record", the values of the "key" and "disk address" pair were added to the the "index file". When a program wished to retrieve a "record" by using its "key", the program had to:

      1. access the "index file"
        The "index file" is stored in memory (most of the time) and NOT on disk -- as this would require an additional amount of disk access.
      2. find the "key" in the list
        This task was easy to accomplish if the "index file" was sorted in ascending "key" value order. This also necessitated resorting the list each time a new "record" was added to the "index file". Often other strategies were employed to allow the "key" value to be easily found in the list (e.g., binary search, search trees, linked lists, hashing, B-trees, etc.)
      3. retrieve the record from the disk using the "disk address"

      In programs, the difference in QSAM and ISAM can be seen in the COBOL statements that read a "record". In QSAM (sequential) files, records are read in physical sequential order (like a deck of punched cards, records on a tape, or records on disk which are read in the order in which the were physically written). In COBOL a record from a sequential file is retrieved like this:

      READ file-name RECORD AT END statements

      here:

      ...file-name is the name of the file as known to the operating system and defined in program
      ...the AT END clause is followed by a list of statements to be performed when the end of file is encountered

      For ISAM, the COBOL statement to read a record is:

      READ file-name KEY key-name INVALID KEY statements


      ...file-name is the name of the file as known to the operating system and defined in program
         (the file also has an additional index file associated with it)
      ...key-name is a program variable that contains the value of the "key" the user is trying to retrieve
      ...INVALID KEY clause is followed by a list of statements to performed if the value of the "key" if NOT found

      Older developers will recall with fondness, the IBM utility programs: IEBGENER used to create files; IEBISAM which performed a re-sort of the index file; b>IEHMOVE used to "move" files;

      ISAM became a critical part of early programming languages: FORTRAN, COBOL and eventually C. Each of these steps underwent improvements and changes as the size of the files and the capacities of the disk drives improved.

    2. VSAM

      The second major disk access method invented by IBM was VSAM (Virtual Sequential Access Method) in the late 1970s. This technique in general added "empty" space in the index list so that new records could be inserted into the indexed file without having to resort the index file.

    3. IMS by IBM (and Vern Watts), 1968

      Even though ISAM -- and eventually VSAM -- provided IBM customers with direct data access by "key", most file processing designs and decisions were made by programmers at the lowest level of sofware and hardware. Based on the work of IBM employee Vern Watts, the IMS Database was developed. IMS became the first commercial viable "full function" database product. It was based in the "hierarchical" model where data is represented as a tree-like structure (as opposed to Codd's "relational" model of tables -- see below).

      The hierarchial model is like the XML discusssions we have had elsewhere. The "parent-child" relationship of XMl are representative of the type of structure IMS contains (see this on IMS).

      IMS was an extremely successful product for IBM and became the IBM's main database product for several years.

    ...and then...

    Edgar Codd developed the key idea for relational databases while at IBM. His seminal paper appeared in 1970: "A Relational Model of Data for Large Shared Data Banks" (in Communications of the ACM Volume 13 Number 6 pg 377, 1970) after an internal IBM paper one year earlier.

    He also is credited with "normalization":

    • 1NF -- Codd's definition of 1NF makes reference to the concept of 'atomicity' (paper cited above)
    • 2NF -- Codd's paper: "Further Normalization of the Data Base Relational Model." (Presented at Courant Computer Science Symposia Series 6, "Data Base Systems," New York City, May 24th-25th, 1971.) IBM Research Report RJ909 (August 31st, 1971).
    • 3NF -- also in 2nd Codd's paper cited above

    Most famous quip about database "...so help me Codd."

    1. System R by IBM 1974

      The move from hierarchical databases to the relational model began at IBM. System R was an experimental database product that was designed by IBM in the early 1970s (see this). The key development was SQL (Structured Query Language) by by Donald D. Chamberlin and Raymond F. Boyce in the early 1970s:

      1. 1974-75 -- single user subset of SQL
      2. 1976-77 -- multi-user design that became SYSTEM R
      3. 1978-79 -- on-site experimental installation

    2. INGRES (INteractive Graphics REtrieval System), 1974 and POSTGRES, 1985

      Michael Stonebraker and Eugene Wong of University of California Berkeley became interested in the System R work. With modest support from NSF, , Air Force Office of Scientific Research, the Army Research Office, and the Navy Electronic Systems Command, students and staff developed INGRES in 1974. By 1980 over a thousand copies of INGRES had been distributed (for free) mostly to universities.

      Berkeley students Jerry Held and later Karel Youseffi (at Tandem Computers) built a system that evolved into NonStop SQL. The Tandem database system was a re-implementation of the Ingres technology. By 1989 Tandem was competing with Oracle and lost market share because of Oracle's massive marketing efforts.

      Robert Epstien was the lead programmer at Berkeley and later founded to Sybase (see below)

      In 1980, Stonebraker and Wong and Lawrence Rowe founded Relational Technology, Inc. (RTI) and renamed their company INGRES in the late 1980s. In 1994, INGRES was purchased by Computer Associates.

      Stonebraker returned to Berkeley in 1985 and began an improved version of INGRES whose prototype of 1988 became in the POSTGRES. In 1996 the project was renamed PostgreSQL. Since 1997 the code has been "open source".

    3. ORACLE 1977

      Larry Ellison heard about the IBM System R database from an article in the IBM Research Journal provided by Ed Oates (a future co-founder of Oracle). Ellison co-founded Oracle Corporation in 1977 with Bob Miner and Ed Oates under the name "Software Development Laboratories" (SDL). ORACLE version 1 was released in 1978 for the Digital Equipment Corporation PDP 11 hardware. In 1979 SDL changed its name to Relational Software, Inc. (RSI). In 1982, RSI renamed itself Oracle Systems Corporation. In the same year ORACLE was upgraded for the new Digital Equipment VAX hardware system.

      Oracle continued its dominance in the non-IBM marketplace for years. With the rise of the web and the client-server architecture, ORACLE took over as the top commercial database product through aggressive marketing and acquisitions such as: Peoplesoft in 2004; Siebel in 2005; Hyperion in 2007; Sun Microsystems (along with all the JAVA patents) in 2012.

    4. INFORMIX 1981

      From the Relational Database Systems, Inc. founded by Roger Sippl and Laura King, 1980
      INFORMation on unIX in 1981.
      In 1996 Michael Stonebraker became Informix's CTO, a position he held until September 2000.
      Purchased by IBM 2001

    5. DB2 by IBM, 1983

      The most successful implementation of the relation model was IBM's relational product DB2 which was released for it mainframe customers in 1983.

    6. SYBASE 1984

      Founded 1984 by Mark Hoffman, Bob Epstein (lead programmer at INGRES above), Jane Doughty and Tom Haggin in Epstein’s home in Berkeley, California
      Sybase's SQL Server 4.9 was used by Microsoft to create Microsoft's SQL Server first used circa 1989
      (Microsoft's SQL Server carried Sybase copyright notices until 1997)
      Acquired by SAP 2010

    7. Microsoft SQL Server, 1989

      Adapted from Sybase Version 4.9 in 1989, the product became Microsoft's flagship database.

    8. MySQL, 1995

      MySQL was created by a Swedish company, MySQL AB, founded by David Axmark, Allan Larsson and Michael "Monty" Widenius. The first version of MySQL appeared on 23 May 1995. Based on the low-level language ISAM, which the creators considered too slow and inflexible. They created a new SQL interface, while keeping the same API as mSQL.

      In January 2008, Sun Microsystems bought MySQL for $1 billion. Then, in April 2009, Oracle Corporation entered into an agreement to purchase Sun Microsystems,

      A movement against Oracle's acquisition of MySQL, to "Save MySQL"from Oracle was started by one of the MySQL founders, Monty Widenius.

    The Top Ten DBMS systems from http://db-engines.com/en/ranking (all but MongoDB and Cassandra are Relational DBMS) are shown below:

    1. Oracle
    2. MySQl
    3. Microsoft's SQL Server
    4. PostgreSQL
    5. MongoDB (document store, see below)
    6. DB2
    7. Microsoft Access
    8. SQLite
    9. Sybase ASE
    10. Cassandra (wide column store)

  2. SHARDING

    Most applications begin to perform slowly when the query volume exceeds the server/disk performance capability. The noticeable slowdown in response time necessitates either "fine tuning" the database structure or in many cases splitting the database into several parts and assigning each portion of the database to its own separate server/disk configuration.

    This process is called "horizontal partitioning" because the tables are divided along row boundaries. For example, one might split a database table that whose primary key is "LastName" into equal fourths by creating a server and disk for last names that begin with : A→-F, G→L; M→R, and S→Z. Each of the partitions can also be referred top as a "shard" as in a piece of broken glass (i.e., fragment, sliver, splinter, shiver, chip, piece, bit, particle, ...)

    There are obvious performance issues with this idea:

    Automatic sharding is available in: IBM's INFORMIX (version 12.1) , MongoDB (version 1.6), MySQL Cluster, MS SQL Azure, and Spanner (by Google)

  3. SPEED

    MAKING DATABASES GO FAST!

    Just add lotsa dollars and a few terabytes of memory

    A second major issue with database speed performance arises from the failure of database indices to be stored in memory. With very large databases, the size of the key index files also increase. If the index file is too big to fit in memory, throughput in the database processing slows because the index (the disk location of the key) must be read first from disk, then the actual as well. This can create significant speed degredation downs.

    The realistic solution to this problem has always been to expand the amount of memory. But at some point, the size of the database (and its index structure) exceeds the memory size of conventional servers. This has given rise to the "database engine" hardware and "database in-memory" solutions.

    • Database Engines

      By eliminating the disk drives (as animated above), rotational delay, seek time and transfer time goes away and the transaction speed can be increased by several orders of magnitude. This is clearly the future of database speed! All brought to you by the incredibly low cost of memory.

      For example, Oracle offers the "Oracle Database Appliance X4-2" here which has: two Intel 12 core processors and 18 terabytes of raw storage with expansion options for up to 36 terabytes of data.

    • In-Memory Databases

      "Oracle TimesTen In-Memory Database" is a large memory server where the entire database is stored in memory.

      SAP offers HANA (High-Performance ANalytic Appliance (HANA 1.0, 2010, HANA Enterprise 1.0, 2013). Costs for the database hardware solutions depend on the database size, but one can expect these to run into multi-million dollar solutions.

    All of these solutions are predicated on the change from "rotating storage" (traditional disk drives) to "solid-state" disk drives.

    Costs will continue to drop as memory costs decrease.

  4. NOSQL

    MongoDB

    More recently, during 2010 and 2011, Michael Stonebraker has been a critic of the NoSQL movement.

  5. NEWSQL

http://db-engines.com/en/ranking