Sybase ASE: Dump load failed

We’ve had the following issue: After a problem on a customer database, we needed to restore a single table. A compressed dump was available containing the latest version of this one table but an older version of the other tables. So the plan was to load the dump on a test system with the same device layout, ASE version and operating system, export the table and import it on the live system.

Unfortunately while loading the dump, we got the following error:

Oct  7 10:57:00 2013: Backup Server: 2.23.1.1: Connection from Server SYBASE on Host myhost with HostProcid 7705.
Oct  7 10:57:00 2013: Backup Server: 4.132.1.1: Attempting to open byte stream device: 'compress::9::/dump1_1/mydb.201310030015.000::00'
Oct  7 10:57:00 2013: Backup Server: 6.28.1.1: Dumpfile name 'mydb13276003C2   ' section number 1 mounted on byte stream 'compress::9::/dump1_1/mydb.201310030015.000::00'
Oct  7 10:57:23 2013: Backup Server: 4.188.1.1: Database mydb: 2310872 kilobytes (1%) LOADED.
Oct  7 10:57:32 2013: Backup Server: 4.124.2.1: Archive API error for device='compress::9::/dump1_1/mydb.201310030015.000::00': Vendor application name=Compress API, Library version=1, API routine=syb_read(), Message=syb_read: gzread() error=-1 msg=1075401822
Oct  7 10:57:32 2013: Backup Server: 6.32.2.3: compress::9::/dump1_1/mydb.201310030015.000::00: volume not valid or not requested (server: n byte stream 'cu@ess::9::/dump1_1/mydb.20┬ĄD, session id: 17.)
Oct  7 10:57:32 2013: Backup Server: 1.14.2.4: Unrecoverable I/O or volume error.  This DUMP or LOAD session must exit.

So it looks like there was a problem uncompressing the dump. I am not too sure where the strange characters in the second to last line come from but I’m not sure either that it’s related to the problem.

Reading the header from the dump as described in a previous post worked fine. So the dump was not completely corrupt. It’s also the reason why the first percent of the dump could be loaded.

We also tried loading the dump using the “with listonly” option but it failed:

1> load database mydb from "compress::9::/dump1_1/mydb.201310030015.000" with listonly=full
2> go
Backup Server session id is:  26.  Use this value when executing the 'sp_volchanged' system stored procedure after fulfilling any volume change
request from the Backup Server.
Backup Server: 4.22.1.1: Option LISTONLY is not valid for device 'compress::9::/dump1_1/mydb.201310030015.000::00'.

I never found out why it wasn’t possible to use listonly on this dump file but I didn’t really have time to look into it in details…

The I saw that there was a with verify only option. Here from the Sybase documentation:

load database database_name
from [compression=]stripe_device
[at backup_server_name ]
[density = density_value,
blocksize = number_bytes,
dumpvolume = volume_name,
file = file_name]
with verify only [ = header | full]

But it failed saying there was an error near “only”… Then I wondered why the syntax would be “with headeronly” and “with listonly” but “with verify only” i.e. with an extra space. So we tried without the space and it worked. Well, kind of… It could still load the header but failed with the same error message while reading the rest.

Next I thought it might have been a problem while transferring the dump through FTP (I wasn’t sure whether the transfer was done in binary or ASCII mode). One way to check it is to search for \r\n characters in the dump. It can be done using the od command. od dumps files in octal and other formats. You can use the -c option to show special characters as escape characters (i.e. \r and \n). So you need to run od and pipe it to a grep e.g.:

od -c mydb.201310060030.000 | grep "\\r  \\n"

If this returns something then you probably used the wrong transfer mode using FTP. It wasn’t the case here. Also both the files on-site and the one we got had the same number of bytes which also indicates that it’s not such an issue.

Another thing you need to check is whether uncompressing failed because of memory or disk space issues. In our case we had plenty of free disk space and RAM available.

Another thing I found while googling for a solution was the following in a newsgroup:

Backup Server will use asynchronous I/O by default and there was a CR 335852 to work around this behavior. Try starting backupserver using trace flag -D32 .

CR Description :-
6.21 Dumping or loading databases with asynchronous I/O
[CR #335852] On an IA32 running Red Hat, running a dump or load database command can cause Backup Server to stop responding when using asynchronous I/O. Backup Server uses asynchronous I/O by default.

[ Workaround : Start Backup Server using trace flag -D32 to force a synchronous I/O.

So we tried adding the flag to the start script of the backup server. But it didn’t help. Anyway we didn’t know whether the problem was during loading or whether there had been a problem while dumping.

The next thing which came up to my mind was to try and uncompress the dump file manually to see whether it’s corrupt. This can be done with gunzip. You just need to rename the file in case it doesn’t have a valid gzip extension e.g.:

mv mydb.201310060030.000.gz
gunzip mydb.201310060030.000.gz

In our case it failed. So we repeated it on a dump file we knew was fine and it worked. So we had the source of the problem. The dump stripe was corrupt.
Repeating it on the dump on site worked. So the stripe was not corrupt after the dump but was somehow corrupted in the transfer. So all we had to do was to transfer it again.

I’m not too sure why the stripe got corrupted during the transfer but was happy it didn’t get corrupted while dumping as we had feared in the beginning.

Sybase: Convert seconds to a human readable duration

Let’s say you get a number of seconds and want to convert it to a string so that it’s human readable e.g. saying that something took 15157 seconds is probably not a good idea and it’d make more sense to say it took 4 hours 12 minutes and 37 seconds.

Here’s a very simple stored procedure to perform this conversion:

create procedure hours_min_sec(@seconds int)
as
begin
select right('00'+convert(varchar(2), floor(@seconds / 3600)), 2)+':'+right('00'+convert(varchar(2),floor(@seconds / 60) % 60), 2)+':'+right('00'+convert(varchar(2),@seconds % 60), 2)
end
go

Note that the you may see a line wrap here but actually everything after begin and before end should be on one line.

You can use it like this:

1> hours_min_sec 15157
2> go

 --------
 04:12:37

(1 row affected)
(return status = 0)

Sybase: Updating statistics

In ASE, statistics are used by the Optimiser to determine the appropriate Query Plan. Wrong or out-of-date statistics can cause the wrong query plan to be selected which can lead to massive performance issues. It is thus very important to consider statistics in regular database maintenance activities and make sure they are updated on a regular basis.

There are a few ways in which you can make sure that the statistics are up-to-date and support the query optimizer in choosing the optimal query plan:

In the end of the article, I’ll also shortly discuss the performance considerations required when choosing the appropriate command.

Update Statistics

 

Using the update statistics command, you can update the statistics about the distribution of key values for a table, an index on the table, specific columns of the table or on partitions.

Only reference the table

If you only reference a table name:

update statistics mytable

it will update statistics for the leading columns of all indexes on the table. The leading column of a composite index is the first column in the index definition. This is the most important one since the index cannot be used if only the second column is used in the WHERE clause but not the first one.

Here’s an example of such an index:

create index mytable_comp_ind on maytable(col1, col2)

In this case, col1 is the leading column for this index.

Explicitely reference an index

If you explicitely reference an index on the table:

update statistics mytable mytable_comp_ind

it will update the statistics for the leading column of the specified index i.e. it will update the statistics on col1.

Explicitely reference a partition

If you explicitely reference a partition on the table:

update statistics mytable partition mypartition

it will update the density information on all columns of all indexes of the partition. It will additionally also create histograms for all leading columns of indexes of the partition. You can also provide a list of column names. In this case it will create histograms for the first column and densities for the composite columns.

Note that only local indexes of the partition are considered. Global indexes are not considered.
Also note that updating the statistics for a partition also updates the global statistics i.e. only local indexes are considered but global statistics are updated with the gathered data.

Explicitely reference columns

If you explicitely reference a list of columns:

update statistics mytable (col1, col2)

it will create histograms for the first referenced column and density information for all column groups with the first column as leading column.

Note that performing such an update statistics has two consequences:

  1. A table scan is performed
  2. A sort is performed, meaning that you need to have enough space in tempdb

So this update statistics does have a high impact on the performance of the server and requires more space in tempdb.

Also note that in many cases statistics are mostly useful for columns referenced by indexes (especially as leading columns). Updating statistics for other columns create an overhead. But in some cases it is required and better than using update all statistics.

Update Index Statistics

 

The command update index statistics updates the statistics of all columns of an index:

update index statistics mytable mytable_comp_ind

The difference between this command and:

update statistics mytable(col1, col2)

is that the latter focuses on the lead column of the index. The former creates histograms not only for the leading column of the index but for all columns specified in the index definition.

You can also update the statistics for all columns of all indexes on the table by omitting the index name:

update index statistics mytable

Note that it is not exactly the same as calling update index statistics for each index on the table since the latter updates the statistics of columns referenced by multiple indexes multiple times. So if you intend to update the statistics for all indexes, it’s more performant to omit the index name than issuing one command per index.

Here also you can specify a data or index partition.

From a performance point view, this means that:

  1. An index scan will be performed to update the statistics of the leading column of the index
  2. For the other columns, a table scan for each of them will be required followed by a sorting based on a table in tempdb

This is basically the same problem as when using update statistics on columns which are not the leading column of an index.

Update All Statistics

 

This command creates statistics for all columns of the specified table:

update all statistics mytable

A partition can also additionally be specified.

From a performance point of view, this means that:

  1. An index scan will be performed for each column which is the leading column of an index
  2. A table scan will be performed for each other table, creating a work table in tempdb which will then be sorted

Update Table Statistics

 

The update table statistics is kind of a different beast. It doesn’t update the data in systabstats but only table or partition level statistics in sysstatistics i.e. it does not affect column level statistics.

Here the syntax:

update table statistics mytable

Or with a partition name:

update table statistics mytable mypartition

Modifying Statistics

 

sp_modifystats is a system stored procedure which can be used to update density information for a table of a column group.

There are two uses of this command.

Modify density information

You can use the MODIFY_DENSITY parameter to change the cell density information for columns:

sp_modifystats mytable, "col1%", range, absolute, "0.5"

This will set the range cell density to 0.5 for the column col1 as well as for all column group with density information having col1 as leading column.

The column group parameter can either contain:

  • “all” meaning that all column groups should be updated
  • A column name meaning that only the density information for this column should be updated
  • A list of columns meaning that the density information of this specified column group should be updated
  • A column name or column list including wildcards as shown above

Note that the next update statistics will reset the density information. So it’s a good idea to always call sp_modifystats after an update statistics.

Handling skew in data

Please refer to my previous post regarding data skew, their effect on queries and how to handle them: Data skew and query plans.

Performance

 

Updates of statistics are a required maintenance activities if you want to keep your queries fast. On the other hand, updating statistics also has a non-negligeable impact on performance and load.

In general the leading column of indexes are the critical ones. So you need to make sure they are always up-to-date.
But in some cases it does make sense to have up-to-date statistics also for other columns: If you have a WHERE clause also containing this column not being part of an index and 99% of the values in this column are the same, this will greatly impact how joins are done. Without statistics on this column, a default distribution will be assumed and the wrong join might be selected.

So I’d recommend:

  1. Using “update statistics” instead of the other commands to optimize the leading columns of indexes
  2. Using “update statistics” on specific columns which are not leading column in an index, in case you see that ASE chooses the wrong join because it assumes a default distribution of data and it is not the case. Use it only if really required as it creates a huge load on the system
  3. Avoid using “update all statistics”. It generally makes more sense to use dedicated commands to update what needs to be updated

More information regarding sampling, histogram steps and the degree of parallelism will be added later…

Sybase: Locks when creating indexes

Got the following question from a colleague:

We have a table with allpages lock scheme. A nonclustered index is added to this table (which has millions of entries). During the creation of the index, many blocked delete commands are observed. Would having another lock scheme (data page lock or data row lock) help ?

Answer:

The locking scheme of the table doesn’t matter in this case. When an index is created, a table lock is always created. If the index is clustered, then the lock is exclusive i.e. SELECT, INSERT, UPDATE and DELETE operations are blocked. Otherwise it is only a shared lock i.e. INSERT, UPDATE and DELETE statements are still blocked but SELECT statement are not blocked. In this case it is a nonclustered index but for a DELETE statement it makes no difference, the statement is still blocked.

Sybase: Find long lasting connections

In order to find which connections have been open for a long time, you can use the following SELECT statement:

SELECT
   spid,
   sl.name as 'login',
   sd.name as 'database',
   loggedindatetime,
   hostname,
   program_name,
   ipaddr,
   srl.remoteusername as 'remotelogin',
   ss.srvname as 'remoteservername',
   ss.srvnetname as 'remoteservernetname'
FROM
   master..sysprocesses sp,
   master..syslogins sl,
   master..sysdatabases sd,
   master..sysremotelogins srl,
   master..sysservers ss
WHERE
   sp.suid>0
   AND datediff(day,loggedindatetime,getdate())>=1
   AND sl.suid = sp.suid
   AND sd.dbid = sp.dbid
   AND sl.suid *= srl.suid
   AND srl.remoteserverid *= ss.srvid
ORDER BY
   loggedindatetime

Here a few explanations:

  • sp.suid>0 filters the ASE processes which run with suid=0
  • datediff(day,loggedindatetime,getdate())>=1 filters in only processes which logged in date/time is more than 1 day old
  • if the login is a remote login, the remote login and remote server information will be filled. It will contain null otherwise

Sybase: Performance issue with clustered index on allpages locked tables

When a table has a clustered index, ASE makes sure that all rows are physically stored in the order defined by the columns on which you have the clustered index. There can only be one clustered index on a given table as ASE cannot store the data with two different orders.

You can either explicitely created using the following command:

create clustered index myclusteredindex on mytable(mycolumn)

Or by defining a primary constraint:

create table mytable(
	mycol1 int not null,
	mycol2 int not null,
	primary key (mycol1)
)

Not that if you defined a primary key, the index name is automatically generated and you cannot drop the index using the DROP INDEX command but need to use the DROP CONSTRAINT command.

If the table has a data lock scheme, the table will be reorganized when the clustered index is created but the order of rows will not be further updated. If the table has an allpages lock scheme, then ASE will make sure that the order is maintained.

Note that although clustered indexes (especially when defined through primary key constraints) are often created as unique indexes, it doesn’t have to be the case.

Reading from a table with allpages lock and a clustered index using the keys of the clustered index as criteria is almost always faster than without the clustered index. But writing to the table is slower since ASE needs to maintain the order. This can create huge performance issues when working with huge tables with many updates on the index keys or many inserts/deletes. In some cases (I observed a case on a table with 28 million entries), committing or rolling back changes on such a table can cause many physical IOs. If this is done in a transaction also involving updates on other tables this could cause many other connections to blocked. In the case I had observed, it took up to 30 minutes to finish some rollbacks. My assumption was that it is because ASE needs to reorganize the whole index which involves reading an writing many pages. In this case dropping the primary key constraints solved the problem. You can just replace the clustered index by a non-clustered one.

So I’d recommend in general not to use clustered index on huge allpages tables except if you are 100% sure that you need the extra read performance. The penalty while writing can be so huge that it cannot be compensated by the extra read speed.

Sybase: Why does changing the locking scheme take so long ?

If you’ve already changed the locking scheme of a huge table from allpages locking (which was the locking scheme available in older versions of ASE) to datapages or datarows locking, you’ve noticed that it does take a long time and keeps you CPU pretty busy.

The reason is that when switching between allpages locking and data locking basically means copying the whole table and recreating the indexes.

Here are all the steps involved in such a change:

  • All rows of the table are copied into new data pages and formatted according to the new locking scheme.
  • All indexes are dropped and the re-created.
  • The old table is removed.
  • The information regarding this table in the system tables are updated.
  • The table counter is updated to force recompilation of query plans.

Note that this is also the reason why some people use this switching back and forth as a way to defragment a table.

The first two steps are the one taking the most time. It’s difficult to estimate the time required for this. But you can get an idea by checking the size of the data and indexes for this table. This can be done using sp_spaceused:

1> sp_spaceused report
2> go
 name   rowtotal reserved   data      index_size unused
 ------ -------- ---------- --------- ---------- --------
 report 1837164  8377528 KB 687468 KB 7676644 KB 13416 KB

(1 row affected)
(return status = 0)

It doesn’t tell you how much time is needed but if you do it on different table, you could assume the time needed is almost proportional to the size of data+indexes.

Note that switching between datapages and datarows locking schemes only updates system tables and is thus pretty fast.

Sybase: Access the database with C# using ODBC

I needed to write a very short C# program to access a Sybase ASE database and extract some information.

First had to download the appropriate version of ASE. It contains a directory called \archives\odbc. There is a setup.exe. Just run it.

Now there a new driver available:

create new data sourceThere is no need to add a data source to access ASE from an ODBC connection using C#. Just went there to check whether the driver was properly installed.

Then just create a program connecting to ASE using the following connection string:

Driver={Adaptive Server Enterprise};server=THE_HOSTNAME;port=2055;db=THE_DB_NAME;uid=sa;pwd=THE_SA_PASSWORD;

If you omit the db=… part, you’ll just land in the master database.

With this connection, you can then execute statements.

Here a sample code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Data.Odbc;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            String errorMsg;
            OdbcConnection con = Connect("sa", "sa.pwd", "2055", "192.168.190.200", "mydb", out errorMsg);
            Console.WriteLine(errorMsg);
            if (con != null)
            {
                Console.WriteLine("In database {0}", con.Database);
                OdbcCommand command = con.CreateCommand();
                command.CommandText = "SELECT name FROM sysobjects WHERE type='U' ORDER BY name";
                OdbcDataReader reader = command.ExecuteReader();
                int fCount = reader.FieldCount;
                for (int i = 0; i < fCount; i++)
                {
                    String fName = reader.GetName(i);
                    Console.Write(fName + ":");
                }
                Console.WriteLine();
                while (reader.Read())
                {
                    for (int i = 0; i < fCount; i++)
                    {
                        String col = reader.GetValue(i).ToString();
                        Console.Write(col + ":");
                    }
                    Console.WriteLine();
                }
                reader.Close();
                command.Dispose();
                Close(con);
                Console.WriteLine("Press any key too continue...");
                Console.ReadLine();
            }
        }

        private static OdbcConnection Connect(String strUserName, String strPassword, String strPort, String strHostName, String dbName, out String strErrorMsg)
        {
            OdbcConnection con = null;
            strErrorMsg = String.Empty;
            try
            {
                String conString = "Driver={Adaptive Server Enterprise};server=" + strHostName + ";" + "port=" + strPort + ";db=" + dbName + ";uid=" + strUserName + ";pwd=" + strPassword + ";";
                con = new OdbcConnection(conString);
                con.Open();
            }
            catch (Exception exp)
            {
                con = null;
                strErrorMsg = exp.ToString();
            }

            return con;
        }

        private static void Close(OdbcConnection con)
        {
            con.Close();
        }
    }
}

That was really easy ! Not that it’d have been more difficult with Java or PHP, but I’d have expected to waste ours making mistakes and trying to debug it…

Sybase: Disable the transaction log

In some cases, you keep getting problems with a transaction log always filling up but do not need to be able to restore all data in a disaster recovery scenario (e.g. because you’re initially filling the database and if something goes wrong, you can just repeat the process or because it’s a test or development system and you do not care about the data).

Unfortunately, it is not possible to completely disable it. But you can make sure that the transaction log will be truncate on every checkpoint. The transaction log is then still there. It still costs you resources but it will be cleared at every checkpoint which will prevent it from filling up.

In order to have it truncated on checkpoint, use the following command:

exec master..sp_dboption mydb, 'trunc log on chkpt', true

Replace mydb by the name of the database on which you want to perform it.

To reenable the transaction log, just execute the following:

exec master..sp_dboption mydb, 'trunc log on chkpt', false

Sybase ASE: Get first or last day of previous, current or next month

Important: This post is about Sybase ASE. It will not work in iAnywhere.

If you want to provide some filter possibilities in your application showing the data stored for the previous, the current or the next month. So you basically need to figure out the first and last day of the corresponding month.
In case you cannot or do not want to do it in the application code itself, you can use simple SQL statements to get this info.

First you need to figure out the current date:

declare @today datetime
select @today=getdate()

Let’s start with the current month:

Today is January, 30th. So the first day of the month is January, 1st. Unfortunately, you cannot directly set the day to 1. But you can extract the day from the date, in our case 30 and 1 is 30 – (30 – 1), so:

declare @today datetime
select @today=getdate()
select dateadd(dd,-(day(@today)-1),@today)

This will return: Jan 1 2013 1:49PM

So basically we have the right date but for comparison purposes, we’d rather have midnight than 1:49pm. In order to do it, you need to convert it to a date string and back to a datetime:

declare @today datetime
select @today=getdate()
select convert(datetime, convert(varchar(10),dateadd(dd,-(day(@today)-1),@today),101))

Now we get: Jan 1 2013 12:00AM

if you’re only interested in a string containing the date, just drop the outer convert:

declare @today datetime
select @today=getdate()
select convert(varchar(10),dateadd(dd,-(day(@today)-1),@today),101)

Use another format than 101 if needed. The complete list of date conversion formats can be found here. For example, for the German date format, use 104 (dd.mm.yyyy).

Now let’s get the last day of the current month. This is basically the day before the first day of next month.

So first let’s get the first day of next month. This is actually just 1 month later than the first day of the current month:

declare @today datetime
select @today=getdate()
select convert(varchar(10),dateadd(mm,1,dateadd(dd,-(day(@today)-1),@today)),101)

This returns: 02/01/2013

Now let’s just substract one day and we’ll get the last day of the current month:

declare @today datetime
select @today=getdate()
select convert(varchar(10),dateadd(dd,-1,dateadd(mm,1,dateadd(dd,-(day(@today)-1),@today))),101)

This returns: 01/31/2013

Since we already have the first day of next month, let’s get the last day of next month. This is basically the same again but instead of adding 1 month, you add 2 months:

declare @today datetime
select @today=getdate()
select convert(varchar(10),dateadd(mm,2,dateadd(dd,-day(@today),@today)),101)

This returns: 02/28/2013

Now let’s tackle the previous month. The first day of last month is basically the first day of the current month minus 1 month:

declare @today datetime
select @today=getdate()
select convert(varchar(10),dateadd(mm,-1,dateadd(dd,-(day(@today)-1),@today)),101)

This returns: 12/01/2012

And then the last day of previous month. It is one day before the first day of the current month:

declare @today datetime
select @today=getdate()
select convert(varchar(10),dateadd(dd,-(day(@today)),@today),101)

This returns: 12/31/2012