OS/2 eZine - The Best OS/2 Reading Anywhere!

16 December 2001

ODBC on OS/2 Part 9

The ODBC series up to now has been oriented towards the general user, that is people that use ODBC applications. Section 1 (parts 1 & 2) described what ODBC is, how it works and how to install it on OS/2. Section 2 (parts 3 - 8) covered all the ODBC database drivers and associated database servers or clients available for OS/2. With this last section (which will probably span parts 9-12) the orientation of the articles changes more to programmers rather than general users. In this section we will discuss what ODBC is and how it works from a programmer's perspective, and issues related to writing ODBC enabled applications. What we will not be covering is how to write ODBC database drivers, which is a very different programming problem than writing applications that use ODBC to access data. This section won't be a tutorial on programming ODBC, but rather an introduction to ODBC and how it fits into OS/2.

The issues faced by programmers writing ODBC applications broadly fall into two areas:

Understanding how ODBC works and how to write applications that work with multiple databases,
Understanding how a particular language interfaces to ODBC or implements ODBC calls; in other words how to use ODBC from a specific language.

The first issue - understanding how ODBC works (programmatically) and how to write interoperable ODBC applications - will be covered in this article, part 9. Issues specific to different languages will be covered in the following months. The plan is:

Part 10 will cover using C to write directly to the ODBC API. I will also show a set of VisualAge C++ classes I have written that simplify ODBC programming for C++ programmers.
Part 11 will cover ODBC programming in Rexx. Each of the big Rexx programming environments (VxRexx, VisProRexx, GPF Rexx) included a Rexx interface to ODBC, although all require the programming environment for the Rexx-ODBC interface to work. We will also look at a Rexx-ODBC interface that uses SOM to provide the bridge between Rexx and ODBC, using Object Rexx's ability to call SOM Objects.
Part 12 will cover Object Pascal's ODBC interface, as implemented by Sybil. It will also wrap up the series.

It may seem strange to be talking about ODBC programming in languages other than C, since the ODBC API is actually a C language API. But other languages have built interfaces to ODBC, interfaces that range from providing most of the power of the ODBC API, all the way down to just providing the ability to pass SQL statements and get results. The list above is not all-inclusive: for instance IBM released a rather short lived language called Visual Basic which was an attempt to create a version of BASIC with objects which had an ODBC interface.

In order to better understand what ODBC is and how ODBC works it helps to place the ODBC API in the context of other APIs, so we will look at embedded SQL programming and generic CLI programming before moving on to ODBC.

What is SQL?

SQL is a language used to query and manipulate databases. It was designed to simplify and standardize how a programmer could get at stored data, and to provide a way for data users to query their data. SQL and relational databases grew up together and depend on each other. Both are required to solve the problems that up to then existed, namely that the users of the data could not write the programs needed to access the data. The reason they couldn't was that data was stored in all sorts of specialized structures that each had their own unique access methods and programming techniques. The system programmers that created the data storage structures/files also created the programs that retrieved the data. The users of the data had to rely on the system programmers to write new queries for accessing data each time the users' needs changed. The goal of SQL was to replace the need for custom data retrieval programs with a standardized language that was general enough to handle all ad-hoc data access requirements.

Since the goal was to create a language that simplified and standardized how to access data, the language was designed to be non-procedural. A procedural language is one where the desired result is defined by a number of ordered steps - the program. A non-procedural language is where the desired result is described by a single language statement. The thought was that many users of the data (lawyers, accountants, etc.) would not be capable of writing program loops in order to retrieve data. So the non-procedural nature of SQL allowed the user to ask a single question, albeit a sometimes complex question, in the form of a single SQL statement.

The non-procedural aspects of SQL pose some problems which were recognized early on by database vendors. There are some problems that are either impossible or very difficult to do with only SQL, which become possible or much easier within a procedural context. Therefore some database vendors created procedural "dialects" of SQL. For example Sybase has Transact-SQL, Oracle has PL/SQL and Informix has SPL; each of which has normal flow-control devices (IF-THEN, loops, etc.)

But even with procedural constructs added to SQL the language cannot stand on its own; it has to be employed within or by another programming language. SQL has no output facilities, no cout , printf() or say statements. The data retrieved from the database by SQL has to be transferred to another language in order to be sent to the printer, viewed on the screen, or to be otherwise generally useful.

How Does SQL Interface with a Programming Language

If SQL is the language used to talk to the database, how does a programming language send SQL statements to the database? And how does the data from the database get from SQL into the programming language? The answers to those two questions depend on the method used to interface SQL into a programming language.

There are generally two methods of interfacing SQL in a programming language.

Embedding the SQL statements into the language. In this method the SQL statements are placed in the language with special markers that identify the statements as SQL, and other special markers that list the programming language variables that are to be used to store the resulting data returned from the database. The program is then run through a preprocessor or precompiler that replaces the SQL statement with one or more database function calls that do the actual data retrieval/manipulation. While this method could theoretically work with any type of language it has so far only appeared in compiled languages - with the exception of Java which really floats between being an interpreted language and a compiled language.
Passing SQL statements in function calls, where the SQL statement is a parameter in the function. If the SQL statement returns data from the database other function calls are needed to specify which programming language variables are to be used to store the returned data.

The embedded method is the oldest method and is the method that SQL was designed around. Using functions to pass SQL statements to the database is a later development. This type of interface is called the Call Level Interface (CLI) and has also been standardized in both the X/Open CAE Specification "Data Management: SQL Call-Level Interface (CLI)" and ISO/IEC 9075-3:1995 (E) Call-Level Interface (SQL/CLI).

Writing programs is easier with the embedded method because it tracks closer to SQL than the function call method. However compiling programs is much easier with the function call/CLI method because the precompiler step is avoided, along with the expansion of the SQL statement(s) into multiple programming language statements and functions. Getting a programming language syntax error with the embedded method can be a nightmare to resolve because of all that extra stuff inserted into the program by the precompiler.

ODBC is based on the CLI method but it adds some additional features, which we will see below.

It helps to understand the added complexity of the CLI method by comparing it to embedded SQL programming. Before we can do that however we must stop for a moment and talk about the SQL SELECT statement. Although this is not a SQL tutorial I need to show you enough to be able to explain the embedded and CLI methods.

The SELECT statement in SQL retrieves data from a table in a database. The form of the statement is:

SELECT column_name,column _ name ,... FROM table_name WHERE condition

Column_name is the name(s) of the columns from the table you want to retrieve. The WHERE clause is optional and allows for specifying a condition that limits the rows from the table that are retrieved.

For example if we wanted to look at all names and addresses of all the democratic voters in the city of Littleton we could write a select statement that says

select first_name,last_name,address from voters where city='LITTLETON' and party = `DEMOCRAT'

This would retrieve about 5,856 rows from a table of 22,656 (voters.html) registered votes - those 5856 rows are called a "result set". So far so good. But how will we deal with 5856 rows of data? We really need to look at each individual row, or send each individual row to a printer. SQL has a construct called a cursor , which is a result set associated with a pointer. We can use the pointer to select single rows from the result set and get the data from the columns on that row. The way that is done is (and here SQL's non-procedural nature seems to change to something that sure seems procedural-like):

DECLARE a cursor, i.e. create a cursor from a select statement.
OPEN the cursor - which causes the result set to be created
FETCH the cursor data INTO some host programming language variable(s)

Each time the FETCH statement is executed the cursor pointer is moved to the next row and the data is copied to the host programming language variables that are specified. Traditionally the pointer can only move forward through the result set and can only move one row at a time. Newer databases add the ability to move the pointer either forwards or backwards and by some arbitrary number of rows in either direction. (ODBC has something called a cursor library that will provide the backward and arbitrary movement functions if the database doesn't support those features.)

Now lets look at an example of the embedded method.

Embedded SQL

Embedded SQL can be used in whatever languages the database vendor supports. Traditionally the languages used with embedded SQL are C, C++, COBOL, FORTRAN, and Java. On OS/2 both DB2 and Oracle support a Rexx interface that looks something like embedded SQL but is really a cross between embedded and CLI.

The example below is a simple C program that opens a cursor requesting all the democratic voters from the city of Littleton, and prints out the results. This program would have to be run through a precompiler, supplied by the database vendor. The output from the precompiler would then be compiled.

EXEC SQL INCLUDE SQLCA;
EXEC SQL BEGIN DECLARE SECTION;
char fname[254];
char lname[254];
char address[254];
EXEC SQL END DECLARE SECTION;

int rn=0;

/* connect to the database */
EXEC SQL CONNECT "DBA"

/* create cursor for SELECT statement */
EXEC SQL DECLARE CURSOR c1 FOR select first_name,last_name,address from voters
where city='LITTLETON' and party = `DEMOCRAT';

EXEC SQL OPEN CURSOR c1;

while (1) /* start a loop */
{
/* execute the select statement */
EXEC SQL FETCH c1 INTO :fname, :lname, :address;
/* break out of the loop when no more data is indicated */
if (SQLCODE != 0) break;
rn++; /* increment a row counter */
/* print out the row number, first name, last name and address */
printf("row %ld Name: %s %s Address: %s\n",rn,fname,lname.address);
}

Notes on the code.

The SQLCA is a structure that is used to communicate results (error codes or success codes) from the database.
The C variables we want to use to communicate with SQL are declared in the DECLARE SECTION so the precompiler can recognize them. This section is required even though we identify which variables we want to use in the FETCH statement by prefixing the variable name with a colon - these are called host variables. The reason for this section is the precompiler needs to know the type and size of host variables so that it can generate the code needed to convert from the database type into the host variable type. We will discuss type matching between the host language and variables later on below.
The cursor that we create (with the DECLARE) is named C1. Cursor names allow multiple cursors to be DECLARED, OPENED and FETCHED at the same time since each cursor is referenced by its name.
The SQLCODE is an element in the SQLCA structure that was declared at the top of the example. It contains a numeric value to indicate success or failure.

With the embedded interface SQL integrates rather nicely with a programming language, in this example C. It is fairly straightforward to send SQL statements to the database and get the data from the database into the programming language (the host language) variables. The disadvantage is that the program will only run against the database it is precompiled for. This is because the precompiler is specific to a database; each database vendor uses their own precompiler that only works on their database. This also means that a single program cannot easily access more that one database; to access more than one vendor's database in a single program you would have to break the program into separate modules for each database, precompile and compile each module separately and then link the modules together into a single program.

Call-Level Interface (Function Calls)

This is an example of a CLI type program that does the same thing as the embedded program above. It retrieves and prints out the names and addresses of democratic voters registered in the city of Littleton. If this program were complete it could be compiled either as a DB2 CLI program or as an ODBC version 2.x program. As in the embedded example above all error checking has been left out for clarity.

The sequence is somewhat more involved. The basic steps are:

Allocate an environment handle
Allocation a connection handle
Connect - using the connection handle
Allocate a statement handle
Execute a statement - using the statement handle. For select statements this single step replaces the DECLARE, and OPEN steps of embedded SQL.

/* handles */
HENV henv;
HDBC hdbc;
HSTMT hstmt;
RETCODE rc;
/* variables to hold data returned from database */
char fname[254];
char lname[254];
char address[254];
/* stores length of data return from fetch */
SDWORD fnameLen;
SDWORD lnameLen;
SDWORD addressLen;
/* row counter variable */
int rn;

/* allocate an environment handle */
rc = SQLAllocEnv(&henv);

/* allocate a connection handle */
rc = SQLAllocConnect(henv,&hdbc);
/* make the connection */
rc = SQLConnect(hdbc,"DBA",SQL_NTS,NULL,NULL,NULL,NULL);

/* allocate a statement handle */
rc = SQLAllocStmt(hdbc,&hstmt);

/* now execute the select statement */
rc = SQLExecDirect(hstmt,"select first_name,last_name,address from voters
where city='LITTLETON' and party = 'DEMOCRAT',SQL_NTS);

/* now specify which C variables are to hold the data */
rc = SQLBindCol(hstmt,1,SQL_C_CHAR,fname,254,&fnameLen);
rc = SQLBindCol(hstmt,2,SQL_C_CHAR,lname,254,&lnameLen);
rc = SQLBindCol(hstmt,3,SQL_C_CHAR,address,254,&addressLen);

while(1)
{
rc = SQLFetch(hstmt)
if (rc!=0) break;
/* increment the row counter */
rn++;
/* print out the row number, first name, last name and address */
printf("row %ld Name: %s %s Address: %s\n",rn,fname,lname.address);
}

Notes on the code.

CLI programs require "handles" to an environment, a connection and a statement. These handles are used to link connections to an environment and SQL statements to a connection. The process of "allocating" the handle actually sets up some structures in that are needed by the database manager. The use of handles allows multiple environments, connections and statements to be active at the same time - depending on the limitations of the database driver.
For those unfamiliar with C, the & in front of a parameter name in a C function essentially makes that parameter a "pass by name" rather than the C default of "pass by value"
The SQL_NTS value passed as a parameter indicates that the string containing the SQL statement is a Null Terminated String. Otherwise this parameter would have to be the length of the statement/string. SQL_NTS and SQL_C_CHAR are special constant values defined in the CLI header files. HENV, HDBC, HSTMT etc. are special names given to normal C data types that are defined in the CLI or ODBC header files.
The purpose of the SQLBindCol function calls is to associate a host language variable with a column from the result set returned by the database. The first parameter identifies the statement we are working with. The second parameter identifies which column in the result set we want to associate; the columns in the result set are numbered from left to right starting at column 1. The third parameter identities the type of the host variable and the fourth parameter specifies the name of the host variable. The fifth parameter is the maximum length of data the host variable can hold and the sixth parameter is actually an output parameter. The number of characters copied from the result set column into the host variable are placed here, or else an indicator saying that column for that row had no data - i.e. a NULL indicator. We didn't show how nulls are indicated in the embedded example above to keep things simple.
Where the embedded code uses the SQLCODE special variable to communicate errors and the end of the data, the CLI program uses the return code from the SQLFetch function.

The CLI program requires more code to accomplish the same thing. For example, in embedded programming we can simply specify which host variable we want to use in the FETCH statement; with CLI programming we have to call a function for each host variable which is used to define the name, type and size of the host variable - things that the precompiler figures out (type and size anyway) for us automatically. The trade-offs are:

CLI programming is more portable. The error codes and methods for discovering information about the result set are standard and portable among different databases.
A single application can be written to access multiple vendors' databases with a single set of routines; that application can dynamically load the CLI library needed to access a specific database at run time, rather than having to distribute separate binaries of the application for each database the application is going to access.
Enhanced performance. Multiple rows can be fetched from the result set at the same time. In the small example code shown in the embedded and CLI sections, we saw that the fetch is called once for each row of data in the result set - in our examples above we had 5856 rows so we called the fetch function 5865 times. in CLI an extended fetch function exists that retrieves multiple rows from a result set and copies the data from those multiple rows into a set of array variables, all in a single function call. This cuts down on function calls and separate network accesses.

ODBC & CLI

When ODBC was created it was based on the emerging CLI standard, although it did not completely implement that preliminary standard. After the introduction of ODBC the CLI standard was modified and finalized into the standard that exists now. ODBC was then changed (in version 3) to match the CLI standard - it fully implements the CLI standard. If a program is written to the CLI standard it is guaranteed to work with ODBC version 3 if it is compiled with ODBC headers and is called by the ODBC Driver Manager. With DB2 the same is also true with ODBC version 2.x; with some #defines set you can compile an ODBC version 2.x program with the DB2 CLI headers and have it work with DB2.

ODBC is also a superset of the Call-Level Interface, meaning that ODBC provides some additional functions and features that do not appear in the CLI standard. These additional features include: the Driver Manager, code-page translation libraries, functions that report the database driver's ANSI SQL conformance and support, escape sequences for various literal values and scalar functions, and standardized scalar functions. Each of these additional features is discussed below.

Driver Manager

(This section is mostly quoted from Microsoft's ODBC 3.0 Programmer's Reference and SDK Guide.)

The Driver Manager is a library that manages communications between applications and drivers (meaning here CLI libraries). The Driver Manager exists mainly as a convenience to application writers and solves a number of problems common to all applications. These include determining which driver (CLI library) to load based on a data source name, loading and unloading drivers, and calling functions in drivers.

To see why the latter is a problem, consider what would happen if the application called functions in the driver directly - which is what happens in normal CLI programming. Unless the application was linked directly to a particular driver, it would have to build a table of pointers to the functions in that driver and call those functions by pointer. Using the same code for more than one driver at a time would add yet another level of complexity. The application would first have to set a function pointer to point to the correct function in the correct driver, and then call the function through that pointer.

The Driver Manager solves this problem by providing a single place to call each function. The application is linked to the Driver Manager, not the driver. The application identifies the target driver and builds a table of pointers to the functions in that driver. It uses the connection handle passed by the application to find the address of the function in the target driver and calls that function by address.

For the most part the Driver Manager just passes function calls from the application to the correct driver. However it also implements some functions itself - mainly those that query the list of data source names and drivers installed - and performs basic error checking. For example, the Driver Manager checks that handles are not null pointers, that functions are called in the correct order, and that certain function arguments are valid.

The final role of the Driver Manager is loading and unloading drivers. The application loads and unloads only the Driver Manager. When it wants to use a particular driver it calls a connection function in the Driver Manager and specifies a data source name (or a driver type). Using this name the Driver Manager searches the data source information (on OS/2 this is ODBC.INI) for the driver's file. It then loads the driver (assuming it is not already loaded), stores the address of each function in the driver, and calls the connection function in the driver, which then initializes itself and connects to the data source. When the application is done using the driver it calls a disconnect function in the Driver Manager which calls that function in the driver and then unloads the driver from memory.

Code-Page Translation Libraries

ODBC provides a facility where the user can specify a code-page translation library to be loaded and used for a specific driver. This library then translates characters from one code page to another - i.e. from the code page used by the database to the code page used on the client machine.

Capability Reporting Functions

Different databases can have dramatically different capabilities. These differences range from the data types supported by the database to the SQL capabilities of the database. Even though ODBC (or CLI) fixes the API differences between databases with a common standard, it does not force all the databases to behave the same way. Applications that are intended to run on more than one database still must take these differences into account. For example some databases that have a Call-Level Interface or ODBC database driver do not support outer joins, others that support outer joins use different syntax (e.g. Oracle). The application must be able to recognize what the capabilities are of the database it is talking to and adjust itself accordingly. ODBC provides some functions for querying a database for its capabilities to allow an application to work well with multiple database products.

Escape Sequences

One of the biggest and most troubling differences between databases is how they handle dates and times. Some databases (e.g. Oracle, Sybase, etc.) include a time element with date values, some strictly separate dates and times into separate data types. Most have different syntax for specifying a date value. For example date values are specified as `MM-DD-YYYY' (`12-25-2001') in DB2 and as `DD-MMM-YYYY' in Oracle (`25-Dec-2001'). ODBC provides escape sequences for representing date and time values which are then translated by the database driver into whatever syntax the database requires. These escape sequences can be used when specifying a date (or time or timestamp) to make the date portable across different database products. (ODBC also always returns the date part of the value in the format - YYYY-MM-DD, which simplifies working with date values).

Escape sequences are provided for date, time and timestamp values, scalar function calls, LIKE escape characters (a way of specifying _ or % as literal characters to be searched), outer joins and procedure calls.

Date escape sequence is {d `YYYY-MM-DD'}, example {d `2001-12-25'}

Time escape sequence is {t `hh:mm:ss'}, example {t `23:04:01'}

Timestamp sequence is {ts `YYYY-MM-DD hh:mm:ss'}, example {ts `2001-12-25 23:04:01'}

An example of a date escape sequence in a select statement is

select first_name, last_name from voters where registered_date after {d `1999-01-01'}

The escape sequence for ODBC defined scalar functions is {fn scalar-function }, for example {fn UCASE(NAME)}. The ODBC defined scalar functions are discussed below.

The outer join escape sequence is {oj outer-join } where outer-join is:

table-reference LEFT OUTER JOIN { table-reference | outer-join } ON search-condition

for example

select name, deptname from {oj employee left outer join dept on employee.deptid=dept.deptid} where employee.projnum = 544

Standardized Scalar Functions

ODBC defines a standard set of scalar functions that are to be provided by the database. Not all databases provide all the scalar functions, but when a database provides the capability of a scalar function the ODBC database driver will implement the function. The advantage of using the ODBC defined functions, along with the escape sequence, is that the syntax is standardized across databases. The driver is responsible for translating the ODBC function into a native database function, or implementing the function itself. Database drivers generally state which functions they support, and ODBC API functions can be used to query a driver for which scalar functions it supports.

Scalar functions are functions that can be used in place of a column name in a select statement.

I will just list the scalar functions here because the purpose and usage of most are so obvious explanation is not necessary.

ASCII( string_exp )
CHAR( code )
CONCAT( string1, string2 )
DIFFERENCE( string1, string2 )
INSERT( string1, start, length string2 )
LCASE( string )
LEFT( string, count )
LENGTH( string )
LOCATE( string1, string2[, start] )
LTRIM( string )
REPEAT( string,count )
REPLACE( haystack,needle, new_needle )
RIGHT( string, count )
RTRIM( string )
SPACE( count )
UCASE( string )
ABS( number )
ACOS( float )
ASIN( float )
ATAN( float )
ATAN2( float1, float2 )
CEILING( number )
COS( float )
COT( float )
DEGREES( number )
EXP( float )
FLOOR( number )
LOG( float )
LOG10( float )
MOD( integer1, integer2 )
PI()
POWER( number,integer )
RADIANS( number )
RAND( integer )
ROUND( number, integer )
SIGN( number )
SIN( float )
SQRT( float )
TAN( float )
TRUNCATE( number,integer )
CURDATE()
CURTIME()
DAYNAME( date )
DAYOFMONTH( date )
DAYOFWEEK( date )
DAYOFYEAR( date )
HOUR( time )
MINUTE( time )
MONTH( date )
MONTHNAME( date )
NOW()
QUARTER( date )
SECOND( time )
TIMESTAMPADD( interval, integer, timestamp )
TIMESTAMPDIFF( interval, timestamp1, timestamp2 )
WEEK( date )
YEAR( date )
DATABASE()
IFNULL()
USER()
CONVERT( value, data_type)

ODBC Versions

ODBC has gone through three major versions with a number of minor releases for each major version. The last version on OS/2 is version 3. The first version that was available as a SDK (Software Developers Kit) was version 2. So in OS/2 we mostly do not have to worry about version 1.

An application written to the ODBC API of a certain version requires a Driver Manager at that version or higher and a database driver at that version or higher. Meaning that an application written and compiled to version 2.x will work with a version 3 Driver Manager and database driver; but an application written and compiled to version 3 will not work with a version 2.x Driver Manager or database driver.

The major differences between version 2.x and version 3.x are those needed to bring ODBC into compliance with the CLI standards, which mostly revolved around changing function names a number of #DEFINES changing from version 2 to version 3. Version 3 really doesn't add much in the way of additional capabilities.

Since there are a number of ODBC version 2.x database drivers in OS/2 that have never been updated to version 3, it is probably best to write applications to the version 2.5 API. This way the application will be compatible with the widest range of database drivers. The one exception to this might be if you are trying to write code that is source compatible with both the ODBC and the CLI standard, so the same application could be used with or without ODBC (although there is really not much advantage to not using ODBC). To be source compatible with most database vendors that conform to the CLI standards, other than DB2, you would have to code your application to ODBC version 3. With DB2 you can write the application to ODBC version 2.5 and still have it compile as a DB2 CLI application.

If you are using a language other than C or C++ you probably don't have a choice. The language will have its data access part written to a version of ODBC and that is what you are stuck with. But it won't matter to you anyway because the language will mask any differences between versions.

Data Types

For languages that have data types, programming around the different data types in different databases, and the mapping of a database's data types to the variable types of the programming language can be a major chore. The headaches are not only caused by the real differences in data types but also by the different names used by the different databases for the same type of data. The difficultly is slightly helped by ODBC having a finite set of datatypes, as shown below.

#define SQL_CHAR 1
#define SQL_NUMERIC 2
#define SQL_DECIMAL 3
#define SQL_INTEGER 4
#define SQL_SMALLINT 5
#define SQL_FLOAT 6
#define SQL_REAL 7
#define SQL_DOUBLE 8
#define SQL_VARCHAR 12
#define SQL_DATE 9
#define SQL_TIME 10
#define SQL_TIMESTAMP 11
#define SQL_LONGVARCHAR (-1)
#define SQL_BINARY (-2)
#define SQL_VARBINARY (-3)
#define SQL_LONGVARBINARY (-4)
#define SQL_BIGINT (-5)
#define SQL_TINYINT (-6)
#define SQL_BIT (-7)

ODBC functions that return the structure of a table and the characteristics of a result set, return both the database name for the column type and the ODBC type mapped to that column. This helps, but it is up to the driver how to map a particular database type to an ODBC type. For example some Oracle database drivers return SQL_DATE for an Oracle DATE column; others return the ODBC type as SQL_TIMESTAMP. So you end up having to know and program around the database vendor and driver vendor instead is blindly relying on the ODBC types to solve type mismatches.

Some database data types have no equivalent in C. For example DB2 has a decimal type that doesn't really map well into any existing C type; the value loses precision when placed into C float and double types.

When you map a host variable to a column in a result set, you specify the type of the host variable. The database driver is responsible for converting the data from the database type into the host variable type. Likewise if you are using input parameters the driver is responsible for converting from the host variable type to the database data type. (We haven't talked about input parameters, but the concepts are basically the same for input parameters as output parameters.)

Making Applications Interoperable

Making applications interoperable means programming the application in such a way as to insure that it works with a wide variety of different databases. I will offer some general tips that are applicable no matter what language you are using, but there really is no substitute for testing the application on all the databases you intend it to run on.

Always use ODBC escape sequences for date, time, timestamp values, outer joins and any scalar functions you need. Since ODBC escape sequences are coded in to the SQL, you can use them regardless of the language being used.
Be aware that most Unix based databases, e.g. PostgresSQL, use all lower case, where most legacy and non-Unix databases use upper case. This shows up in catalog functions where you query for table definitions and data types. IN SQL92 an identifier can be any case if it is quoted. If it is not quoted it is up to the database whether it is stored as upper case or lower case. Don't depend on it being one case or the other.
Be careful with date values. Dates seem to have the widest variation of all the data types.
Reuse connections. Making a connection to a database is an extremely time and resource intensive operation. Minimize the number of connections your application makes by using the same connection for everything you can. This will not only speed up performance but save you from those drivers that do not support multiple connections.
Use the fewest ODBC functions that you can get by with. What I mean is: some applications query the driver to see what capabilities it supports and then use the advanced capabilities if they are present, and use work-arounds if the advanced capabilities are not present. I recommend always using the work-arounds if possible and not even querying for the advanced capabilities. This is because some drivers erroneously report capabilities they don't have, and some drivers fail when queried about some capabilities. (This happens most commonly with open-source drivers.) To avoid these problems, just don't use and don't even ask about advanced stuff if you can live without it. This tip and the one below it probably only apply to C language programs using the ODBC API directly.
Check what the default environment is (auto or manual commit, isolation level, etc.), or else set the environment to a known state when you allocate an environment.

Conclusion and Summary

SQL needs to be used with a programming language to be useful. ODBC is one method of integrating SQL into a programming language. ODBC also provides some very valuable capabilities to help make database applications work against multiple databases, and it greatly simplifies writing applications that access different databases at the same time. Even with ODBC capabilities it takes careful attention to details to write an application that works across multiple databases. The truly excellent database application is the one that works well against databases that didn't exist when it was written.

This month we tried to show some of why ODBC is valuable from a programmer's perspective, some basic concepts behind ODBC and some features of ODBC that can and should be used regardless of the language used for writing an ODBC enabled database application. Next month we will discuss writing ODBC applications in C and C++, how to compile the application and where to get the headers and libraries needed. Our intent is not to provide a tutorial on ODBC programming, but rather to explain the mechanics of writing ODBC applications on OS/2. There are books that already exist on writing ODBC applications, and the ODBC reference manuals from Microsoft are actually pretty good themselves.

Footnotes

1 - Database Principles, Programming, and Performance page 26; by O`Neil and O`Neil ISBN: 1-55860-438-3.

Douglas Clark (mailto:dgclark@attglobal.net) is a program management consultant who first started using OS/2 version 1.3. He's married, with 2 girls, and is old enough to remember when 4 color mainframe terminals were a big thing.

This article is courtesy of www.os2ezine.com. You can view it online at http://www.os2ezine.com/20011216/page_10.html.