![]()
Compatibility Issues Between DataFlex Databases and
Databases Accessed Through CLI Based Connectivity Kits Version 2.1
A Data Access
Worldwide White Paper
by Ben Weijers
August 2002
Last Edited: September 23, 2002
Overview First of all we need to define what exactly a compatibility issue is. For
the purpose of this document we define a compatibility issue as: A situation
where a certain technique used in a DataFlex program gets a different result
when using the DataFlex database than when using a database accessed through CLI
Connectivity Kits.
Result, in this definition, is the end result and does not include the time needed
to get at the end result. Next to the definition above, we would like to introduce performance
issues. For the purpose of this document we define a performance issue as: A situation
where a certain technique used in DataFlex is (much) faster when using the
DataFlex database than when using a database accessed through CLI Connectivity
Kits. Note that performance is only an issue when CLI is slower than DataFlex.
In the cases where it is faster, we do not speak of a performance issue. As
opposed to compatibility issues, the performance issues are not bi-directional.
This is because speed is rated to be a positive thing. Making things faster is
always greeted as an improvement. Making things slower usually is only accepted
if it also introduces some benefit that weighs more than the disadvantage of
losing speed. We will present two lists of issues, compatibility and performance
issues, as defined above. Each individual issue needs to be prioritized on its
own merits. It is not the case that compatibility issues always get a higher
priority than performance issues. Complete compatibility on a non-performing
driver is not acceptable, as is great incompatibility on a fast driver. The issues discussed in this document may or may not apply in a given
environment. Some issues only apply when connecting to existing data, others
always apply. The following sections give a short overview of what may or may
not apply to a given environment. The ODBC Connectivity Kit is not listed in
the following sections because when using ODBC it all depends on the back end.
This document describes compatibility issues between using the DataFlex
database or a database accessed through one of the CLI based Connectivity Kits.
When converting from DataFlex to SQL Server, the following issues can
occur:
·
Overlap
and Underlap Support
·
Case
Sensitive
Index Segments
·
Locking
·
Ascending/Descending
Index Segments
only on pre SQL server
2000 versions.
When attaching to an existing SQL Server table the following issues can
occur:
·
Unique
versus Non-unique
Indexes
·
Locking
·
Zerofile
When converting from DataFlex to DB2, the following issues can occur:
·
Overlap
and Underlap Support
·
Case
Sensitive
Index Segments
·
Locking
·
Zerofile
When attaching to an existing DB2 table the following issues can occur:
·
NULL
Collating
only when NULL values are allowed.
·
Unique
versus Non-unique
Indexes
·
Locking
·
Null
Sorting
only when NULL values are allowed.
·
Zerofile
This chapter discusses compatibility issues between the DataFlex database
and a database accessed through a CLI based Connectivity Kit.
When data is converted from DataFlex to an SQL database, Nullable
columns (columns that will accept a null value) are, by default, never
supported/created. We suggest that you do not change this default. If you do
support Nullable columns and those columns are used in indexes you will need to
be aware of collating and performance issues. We suggest that you avoid using
any Nullable columns and we strongly advise you not to use Nullable columns that
are part of indexes.
For a proper understanding, a few things should be clear. First of all,
DataFlex does not implement a null concept! Secondly, when we say that a column
has the null value we mean that we do not know what the value of the column is.
Thirdly, null values greatly complicate logical expressions. The expression
outcome changes from a binary value (true, false) to a three-valued value (true,
false, unknown). Next to this, there are special syntactical rules for handling
null values. Lastly, ordering a column that allows null values is slower than
ordering the same column that does not allow null values. In other words,
columns that allow nulls in indexes will make finding slower.
Database systems use collating sequences to determine how to sort the
data in alphanumeric columns. In a collating sequence, one specifies what order
characters have relative to each other. Again, null values are special in this
regard. You cannot specify how null values should collate. This is a given
behavior with following possible implementations:
·
SQL_NC_END
= NULLs are sorted at the end of the result set, regardless of the ASC or DESC
keywords.
·
SQL_NC_HIGH
= NULLs are sorted at the high end of the result set, depending on the ASC or
DESC keywords.
·
SQL_NC_LOW
= NULLs are sorted at the low end of the result set, depending on the ASC or
DESC keywords.
·
SQL_NC_START
= NULLs are sorted at the start of the result set, regardless of the ASC or DESC
keywords.
The null collation is SQL_NC_HIGH for DB2, SQL_NC_LOW for SQL Server. For
ODBC, it depends on the back end.
Let us look at a practical sample for a database that uses SQL_NC_HIGH.
Assume a table containing invoices. This table contains a number and a paid
date. These two columns also form an index. Now (only looking at these two
columns) lets assume the following DataFlex data (ordered by the index in
question):
|
Number |
Paid
date |
|
1 |
0 |
|
1 |
01/01/1999 |
|
2 |
10/10/2000 |
|
2 |
10/11/2000 |
|
3 |
0 |
|
3 |
05/05/2001 |
If we convert this with null values allowed, it would result in the
following converted data (also ordered by the index in question).
|
Number |
Paid
date |
|
1 |
01/01/1999 |
|
1 |
0 |
|
2 |
10/10/2000 |
|
2 |
10/11/2000 |
|
3 |
05/05/2001 |
|
3 |
0 |
Now lets look at the find command below and the results in DataFlex and the
CLI back end, respectively:
|
Command |
DataFlex
result |
CLI
result |
|
Clear
X Move
2 To Number Find
Lt X By Index.1 |
1,
01/01/1999 |
2,
10/11/2000 |
The difference in the above results is technically correct. From a technical point of view, there is no problem. We sort by a certain number of columns and we return the records in that order. However, since there is a difference in result, existing applications will not always work correctly when the data is converted to back ends that use SQL_NC_HIGH, SQL_NC_END or SQL_NC_START and null values are allowed.
Any API based driver will require some unique
numeric identifier, the identity, for a row in a table. In CLI databases such an
identifier is not present.
The solution we have chosen is to add an identity
column to a table definition when the table is converted. The identity column is
added at the end of the already existing columns.
The numeric identifier combined with the addition
of the identifier column introduces three incompatibilities.
1.
It is not
possible to connect to tables that do not have a unique numeric identifier.
2.
It is not
possible to convert tables that use recnum relations. The chosen mechanism will
not keep those relations intact.
3. In a mixed environment where the same system runs on DataFlex and some CLI based back end it is not possible to have code that adds columns to an existing table definition result in same table definition. The identifier field will get in the way, thus changing field numbers. Field numbers are the base for working with columns inside a DataFlex application.
DataFlex only supports unique indexes. Most CLI back ends
also support non-unique indexes. If a CLI table has such an index defined, it is
possible that finding may give in some unexpected result. Both SQL Server and
DB2 support non-unique indexes.
Note that this problem does not occur when using
standard converted tables since indexes are forced to be unique during
conversion. The problem could occur when converting with the use of non-unique
index option, adding an index through back end utilities or by connecting to
existing data.
When new indexes are defined or the definition of
an existing index is changed, one can create non-unique indexes. The
Connectivity Kit assumes an index should be non unique when it contains RECNUM
as one of its segments. When such an index is created on the back end, the
RECNUM segment is stripped from the index definition and the non-unique keyword
is used in the create index statement.
In DataFlex one uses the RECNUM segment to create an index that "allows duplicates" (Name, Recnum). In a CLI based environment one would use the record identifier to achieve the same thing (Name, DFRecnum).
Depending on the back end, the rules for legal
table and column construction vary. A table whose definition is perfectly legal
in DataFlex cannot be converted to the back end if the resulting table will not
have a legal construction for that back end.
This issue only applies to ODBC. SQL Server and DB2
both have bigger limits than DataFlex.
A special category in table definition is reserved
words and table, column name limits. There is no problem in using reserved words
as table or column names. We are currently not aware of any table or column name
that cannot be created in SQL Server or DB2.
In ODBC, however, it all depends on the database.
If that database allows the so-called quoted identifiers, reserved words can be
used as table or column names. If the database does not, there is a problem in
converting DataFlex tables with a reserved word used for table or column names.
In SQL Server and DB2, there are no column name
length issues. The name length may be an issue in ODBC. dBase, for example,
allows column names to be of 8 positions. The CLI based Connectivity Kits will
force column names to be unique within a table. So, when converting to a dBase
table with column names “CREDITORADDRESS” and ”CREDITORCITY” would
result in a dBase table with physical column names “CREDITOR” and
“CREDITO1”. The names will be “uniqyfied” much like Windows does for
short directory names. The table definition as DataFlex presents it will still
contain the long names since these are stored in the intermediate file.
It is possible that connecting to existing CLI based tables results in tables that are illegal in DataFlex! One example is a Paradox table that used “File_Number” as a column name. In such cases the illegal columns can be renamed in the intermediate file.
CLI back ends do not support overlaps at all. This
is only a problem if the overlap does not completely span a number of fields.
Overlaps that completely span fields are supported
by the CKs. You can use them in indexes. The sort order, however, is different.
In DataFlex an overlap sorts based on binary comparison. In CLI we simulate the
overlap and have to create an index made up of the underlying fields. This will
be sorted on whatever collating logic is used for the type of the segments.
Overlaps that do not completely span fields (underlaps) are only supported if they are not used in indexes. Underlaps will be converted into overlap fields (the one that completely spans fields) at conversion time. The user will have to edit the intermediate file to get underlap fields. Every Strucuture_end operation on the table that includes a rewrite of the intermediate file will also convert underlaps to overlaps. The user again will have to edit the intermediate file to get underlaps.
CLI back ends do not support case sensitive
segments.
The index is created with the same segments as in DataFlex. The case sensitive setting for index segments is completely ignored. If a programmer wants to enforce a rule on a column to avoid duplicate column data no matter in what casing the data is in, he should look at the check constrain support of the back end. Another option would be to create a trigger that checks for the existence of a record with the same uppercased value and generates an error if it exists, thus rolling back the transaction. Triggers should be defined on update and insert in that case.
Locking is a completely different animal in both
environments. The complete lack of any form of lock function or command in the
CLI API definition demonstrates this difference. This may sound more serious
than it actually is.
The lack of a lock function does not mean there is
no locking. Every multi-user environment - database, operating system or other -
supports some form of locking to guarantee correct results.
Most CLI based back ends support record locking.
The bigger database environment use so-called escalation scheme’s where record
locks can escalate into page locks and eventually table locks. The threshold for
escalating the lock granularity usually is a percentage. So, if more than X
percent of the records on a page is locked, lock the page. If more than Y
percent of the pages in a table is locked, lock the table. Both SQL Server and
DB2 support record locking.
Some CLI back ends support so-called optimistic
locking. All locking fundamentals used in DataFlex require so-called pessimistic
locking. Most back ends that support optimistic locking offer the possibility to
change to pessimistic locking. If the back end only supports optimistic locking,
it cannot be used. Both SQL server and DB2 use pessimistic locking.
On a technical DataFlex level there are
differences:
Beginning
a transaction on DataFlex tables will lock all tables at the moment the
explicit or implicit transaction starts. Beginning a transaction on CLI
tables will not lock anything at all.
Inside
a transaction, every “find eq by recnum” command on a CLI table will
lock the record if the back end supports positioned updates for the cursor
in use. If positioned updates are not supported and SQLSetPos for SQL_UPDATE
and SQL_LOCK_EXCLUSIVE is, SQLSetPOs is used. Both SQL Server and DB2
support positioned updates.
Inside
a transaction, every find command that is not a “find eq by recnum” will
lock the record if the back end supports the use of SQLSetPos for SQL_UPDATE
and SQL_LOCK_EXCLUSIVE on the current record. Both SQL Server and DB2
support positioned updates.
If
the back end does not support positioned updates or SQLSetPos, the records
that are updated will be locked when they are actually updated.
For more information, see http://www.dataaccess.com/whitepapers/TransactionsLockingandDataFlex.htm.
The DF_FILE_MODE attribute is supported in the
sense that you can set and get it. Setting it will not have any effect. All this
does is to store the value so when you get it at a later time it still is what
you have set it to.
CLI does not support a concept that comes close to a file, so it also does not support file modes. In an SQL environment, one uses tables as the base for statements. The tables themselves cannot be set to read-only or read-write other than by setting up privileges (rights) for certain users. In order to be able to work with data in environments like DataFlex cursors are used. Cursors do have modes like read-only and such but they are conceptually different from file modes. All cursors created outside a transaction are set to read-only, all cursors created inside a transaction to “lockable”.
DataFlex uses a so-called zero date when a date
column has not been given a value. CLI
based back ends do not support anything like a zero date. When a column’s
value is unknown, CLI back ends use NULL values. DataFlex does not support
anything that comes close to NULL values.
Zero dates are widely used in existing DataFlex environments.
In order to handle zero dates, a mechanism called
dummy zero date was introduced. When a table does not use the dummy zero date
logic the following applies:
When
a DataFlex zero date is moved from DataFlex to CLI, it is converted to NULL.
When
a NULL values is moved from CLI to DataFlex it is converted to the DataFlex
zero date.
Note that we recommend not to use NULL values. See
also the compatibility issue NULL Collating and
the performance issue Null Sorting.
The dummy zero date uses a given date value and
interprets it as if it were the DataFlex zero date. In DB2 that value is
“01/01/0001”, for SQL Server it is “01/01/1753”, for ODBC it is set to
“01/01/0001” by default but it can be adjusted. When a table uses the dummy
zero date logic the following applies:
When
a DataFlex zero date is moved from DataFlex to CLI, it is converted to the
dummy zero date.
When
a date value that is equal to the dummy zero date is moved from CLI to
DataFlex it is converted to the DataFlex zero date.
When using non-DataFlex tools or Embedded SQL to query the database you should be aware of these settings of a particular table to interpret the data in a correct way.
This chapter describes performance issues.
In general you should be aware that there are a lot of things that influence performance. Performance is influenced by the hardware on which the database server runs, the number of clients accessing the database, the speed of the connection between the database client and server and much more.
When
data is converted from DataFlex to an SQL database, Nullable columns (columns
that will accept a null value) are, by default, never supported/created. We
suggest that you do not change this default. If you do support Nullable columns
and those columns are used in indexes, you will need to be aware of collating
and performance issues. We suggest that you avoid using any Nullable columns and
we strongly advise you not to use Nullable columns that are part of indexes.
When an index contains one or more columns that allow null values and you are finding in a way that null values are part of the result set, finding is slow. The same find operation on the same data but with index segments that do not allow null values will be much faster.
Find
speed on CLI back ends greatly depends on how “optimizable” the resulting
SQL statements and result sets are. We are constantly looking into even better
optimization logic and improvements in this area.
You should be aware that when comparing find speed between DataFlex and CLI based databases there are no absolutes. Sometimes a certain operation will be faster when using DataFlex data, sometimes it will be faster when using CLI based data.
Not
all CLI back ends support descending segments in indexes. This is not a problem
logically because we do not need indexes to sort by certain columns. In the case
a certain index cannot be created in the back end, the Connectivity Kit falls
back and defines it in the intermediate file only. Logically, there is no
difference between the DataFlex and CLI based table.
Physically,
there is a difference because one index does exist in DataFlex and not in CLI.
This will make finding on that index slower.
DB2 supports descending index segments in all of its Universal Database versions. Microsoft SQL Server started to support descending index segments in SQL Server 2000, earlier versions do not support descending index segments. For ODBC it depends on the back end that is used.
To
implement zerofile, all rows from a table are deleted by issuing a “Delete
from X” statement (without a where clause). This can be quite slow especially
if the transaction is logged, triggers actions or otherwise involves extra
database activity.
In SQL Server we use a special SQL construction “Truncate Table X” that is much faster because it will not be logged or trigger other database activity.
The
Structure_End logic speed can vary depending on the changes made. If the changes
made require a recreate of the table, we will:
Create
a temporary file with the same original definition
Copy
all data to the temporary file
Drop
the original table
Create
a new table with the new definition
Create
the indexes
Copy
all data from the temporary table to the new
Drop
the temporary table
If
only indexes were changed, the appropriate indexes are dropped and the new
indexes are created. Any non-touched indexes will not be dropped!
If
only attributes were changed that are stored in the intermediate file, only the
intermediate file is recreated.
The three Structure_End scenarios are in descending order for the time they take. The longest operation is first, the shortest last.
Insert
and update in CLI back ends tend to be a bit slower than in a DataFlex
environment. Most environments where we tested this also implement transaction
logging with roll back and roll forward support. DataFlex does not support
transaction logging other than via Novell TTS.
You
should be aware that when comparing update and create speed between DataFlex and
CLI based databases there are no absolutes. Sometimes a certain operation will
be faster when using DataFlex data, sometimes it will be faster when using CLI
based data.
Data Access
Worldwide
14000 SW 119 Ave
Miami, FL 33186
305-238-0012
Domestic Sales: 800-451-3539
Fax: 305-238-0017
email: sales@dataaccess.com
Newsgroup Server: news.dataaccess.com
Internet: http://www.dataaccess.com
Data Access Worldwide - Asia Pacific
Suite 5, 333 Wantirna Road, Wantirna VIC 3152 Australia
Phone: +61 3 9800 4233 f: +61 3 9800 4255
Sales: asiapacific@DataAccess.com
Support: support.asiapacific@DataAccess.com
Internet: http://www.DataAccess.com/AsiaPacific
Data Access Worldwide - Brasil
Av.Paulista, 1776 - 21st.Floor
São Paulo -SP - Brazil
CEP 01310-921
Phone: 5511-3262-2000
Fax 5511-3284-1579
Sales: info@dataaccess.com.br
Support: suporte@dataaccess.com.br
Internet: http://www.dataaccess.com.br
Data Access Worldwide - Europe
Lansinkesweg 4
7553 AE Hengelo
The Netherlands
Telephone: +31 (0)74 - 255 56 09
Fax: +31 (0)74 - 250 34 66
Sales: info@dataaccess.nl
Support: support@dataaccess.nl
Internet: http://www.dataaccess.nl
Data Access
Technical Support
800-451-3539 / 305-232-3142
email: support@dataaccess.com
Visit our Support Home page to see all of our Support options: http://www.dataaccess.com/support
Data Access Technical
Knowledge Base http://www.dataaccess.com/kbase
Many answers to technical
problems can be found online in the Data Access Technical Knowledge Base. Here,
you can access the same live data that Data Access Worldwide technical support
and development staff use to enter and track technical articles.
Copyright Notice
This document is property of Data Access Corporation. With credit to Data
Access Corporation for its authorship, you are encouraged to reproduce
this information in any format either on paper or electronically, in
whole or in part. You may publish this paper as a stand alone document within
your own publications conditional on the maintenance of the intent,
context, and integrity of the material as it is presented here.
DataFlex is a registered trademark of Data Access Corporation.
NO LIABILITY FOR CONSEQUENTIAL
DAMAGES
To the maximum extent permitted by applicable law, in no event shall Data Access
Corporation be liable for any special, incidental, indirect, or consequential
damages whatsoever (including, without limitation, damages for loss of business
profits, business interruption, loss of business information, or any other
pecuniary loss) arising out of the use of or inability to use any information
provided in this document, even if Data Access Corporation has been advised of
the possibility of such damages. Because some states and jurisdictions do not
allow the exclusion or limitation of liability for consequential or incidental
damages, the above limitation may not apply to you.