Sybase ASE: using CPU affinity to prevent all engines from using the same CPU

Number of engines

ASE creates an OS-level process for each engine you define. You can change the number of engines by using the following command:

sp_configure "engine", N

(replace N by the number of engines you want to configure).

Usually, if you server is almost exclusively used for ASE and you have X CPU cores available, you will want to configure X-1 engines. So assuming you have a dedicated server with 4 CPU cores, you’ll want to configure 3 engines.

You can change this setting also in the SYBASE.cfg:

[Processors]
max online engines = 3
number of engines at startup = 3

The first line defines that there will be 3 engines and the second one that all 3 engines will be started by default.

Even though in many cases, it makes sense to set the same value to both parameters so that you automatically use all available engines. You can also set the second one to a lower value and benchmark the system with less engines and then bring one additional engine online after another.

Increasing the max number of online engines to higher number than the number of available logical CPU’s makes no sense. So I’d always recommend setting it to the total number of logical CPU’s or this number minus 1. Whether you bring them all online at startup or not depends on what else is running on the system and the specific workload you have on this server.

If you configure too many ASE engines for the underlying CPU’s, you will observe some significant loss of throughput. It is due to the high number of involuntary context switches.

Hyper-Threading

Hyper-Threading creates “virtual CPU’s”. So an application running on a system where Hyper-Threading is enabled will think that there are twice as many CPUs as physically available. ASE will not make any difference between virtual CPU’s and real CPU’s.

Although Hyper-Threading provides the ability run two ASE engines for one physical processor, you need to keep in mind that it is still not equivalent to running two engines with two physical processors.

In many cases, you should consider switching off Hyper-Threading. Except if you actually only have very few physical CPU cores available, HT will probably not bring you the expected benefits. You might run into problems because ASE doesn’t see that two CPU’s are basically running on the same physical CPU and it should rather distribute the load between physical CPU’s instead of distributing two queries to the 2 CPU’s running on same physical CPU. Also ASE could schedule queries run at the same time to run only on the virtual CPU’s while it would be better to run them on the real CPU’s (although theoretically, there should be no difference in performance between a virtual CPU and a real one).

But keep in mind that whether HT will bring performance benefits or on the contrary make you system slower really depends on the system itself. It highly depends on your actual hardware and workload. So benchmarking it on the specific system might still be a good idea.

ASE 15.7 comes with a threaded kernel. It takes advantage of threaded CPU architectures. It can thus reduce context switching to threads instead of processes, which brings a performance boost. But this is not related to Hyper-Threading.

Using the default kernel for ASE 15.7, each engine is a thread which lives in a thread pool instead of being an OS process (which was already the case for ASE on Windows even before ASE 15.7).

CPU Affinity

The processes for the different ASE engines have by default no affinity to the physical or virtual processors. Usually, it is not required to force any CPU affinity as ASE will handle it properly.

Here’s an example with 3 engines running and 3 heavy queries running in parallel:

ASE parallel queries good case

Here you see that there are 4 CPUs and 3 engines running using CPU0, CPU1 and CPU3. You can also press “f”, “j” and return in top to have an additional column displayed which will explicitely show which engine is using which CPU:

ASE parallel queries good case with CPU number

The column “p” shows that the 3 dataserver processes use the CPUs 0,1 and 3.

In some cases (not sure when or why this happens), you will see that all dataserver processes will be using the same CPU even though they are processing different queries. Since multiple tasks have to be handled by the same CPU, this will make each task slower and also cause alot of overhead due to task switching.

If this happens, you can use the a “dbcc tune” command to configure a CPU affinity. This can be done by using the following command:

dbcc tune(cpuaffinity, -1, "on")

The -1 parameter is the start CPU. This one will always be skipped. So setting it to -1 means that:

  • The first engine will be bound to CPU0
  • The second one to CPU1
  • The third one to CPU2

If you want to keep CPU0 for other processes, you’d use:

dbcc tune(cpuaffinity, 0, "on")

This will do the following:

  • The first engine will be bound to CPU1
  • The second one to CPU2
  • The third one to CPU3

After that you should see that all dataserver processes are using different CPUs.

Note: The setting will be active only until the ASE server is restarted. So the dbcc tune command must be reissued each time ASE is restarted.

Also note that some operating systems do not support CPU affinity. In this case, the dbcc tune command will be silently ignored.

Sybase ASE: Get one line for each value of a column

Let’s assume you have such a table:

CREATE TABLE benohead(SP1 int, SP2 int, SP3 int)

Column SP1 has non unique values and you want to keep only one row per unique SP1 value.

Assuming we have inserted the following values in the table:

INSERT INTO benohead VALUES(1,2,3)
INSERT INTO benohead VALUES(1,4,5)
INSERT INTO benohead VALUES(1,6,7)
INSERT INTO benohead VALUES(2,3,2)
INSERT INTO benohead VALUES(3,4,6)
INSERT INTO benohead VALUES(3,7,8)
INSERT INTO benohead VALUES(4,1,7)

It’d look like this:

SP1         SP2         SP3
----------- ----------- -----------
          1           2           3
          1           4           5
          1           6           7
          2           3           2
          3           4           6
          3           7           8
          4           1           7

Since SP2 and SP3 can have any value and you could also have rows where all 3 fields have the same value, it’s not so trivial to get a list looking like this:

SP1         SP2         SP3
----------- ----------- -----------
          1           6           7
          2           3           2
          3           7           8
          4           1           7

Even if the table is sorted, iterating through the rows and keeping track of the last SP1 you’ve seen will not help you since you cannot delete the second row because you do not have anything to identify it (like ROW_COUNT in Oracle).

One way to handle it is getting a list of unique SP1 values and their row count:

SELECT SP1, count(*) as rcount FROM benohead GROUP BY SP1

This will return something like this:

SP1         rcount
----------- -----------
          1           3
          2           1
          3           2
          4           1

You can then iterate through this and for each value of SP1 set a rowcount to rcount-1 and delete entries with that SP1 value. In the end, you’ll have one row per SP1 values. Of course, if you just need the data and do not want to actually clean up the table, you’ll have to do it on a copy of the table.

Instead of deleting, you can also iterate through the values of SP1 and fetch the top 1 row for this value:

SELECT TOP 1 SP1, SP2, SP3 FROM benohead WHERE SP1=1

If you had only one additional column (e.g. SP2), it’d be even easier, since you could just use MAX and GROUP BY:

SELECT SP1, MAX(SP2) AS SP2 FROM benohead GROUP BY SP1

which returns:

SP1         SP2
----------- -----------
          1           6
          2           3
          3           7
          4           1

Unfortunately this doesn’t scale to multiple columns. If you also have SP3, you cannot use MAX twice since you will then combinations which didn’t exist in the original table. Let’s insert an additional row:

INSERT INTO benohead VALUES(1,1,9)

The following statement:

SELECT SP1, MAX(SP2) AS SP2, MAX(SP3) AS SP3 FROM benohead GROUP BY SP1

will return:

 SP1         SP2         SP3
 ----------- ----------- -----------
           1           6           9
           2           3           2
           3           7           8
           4           1           7

Although we had no row with SP1=1, SP2=6 and SP3=9.

So if you don’t like the solution iterating and delete with rowcount, you’ll need to introduce a way to uniquely identify each row: an identity column.

You can add an identity column to the table:

ALTER TABLE benohead ADD ID int identity

And them select the required rows like this:

SELECT * from benohead b WHERE b.ID = (SELECT MAX(ID) FROM benohead b2 WHERE b.SP1=b2.SP1)

This will fetch for each value of SP1 the row with the highest ID.

Or you can create a temporary table with an indentity column:

SELECT ID=identity(1), SP1, SP2, SP3 INTO #benohead FROM benohead	

And then use a similar statement on the temporary table.

 

Sybase ASE Cookbook Update

I’ve written this cookbook about 10 months ago. It basically contains all the information about Sybase ASE I’ve documented on my blog over the years. I use it when I need offline access to some tricks (since I’m not getting younger, it’s sometimes useful to have a kind of brain dump somewhere). I also compiled it and published it here in the hope that someone else might find it useful.

I’ve just updated the cookbook with a few new things. But it’s really a small update.

This cookbook is still available for free. And I am still no professional writer and still cannot afford paying someone for proof-read it. So if you notice any mistakes, explanations which cannot be understood or anything like this, please leave a comment here or contact me at henri.benoit@gmail.com. I can’t guarantee how fast I can fix mistakes but I’ll do my best to do it in a timely manner.

Benohead Sybase ASE Cookbook

Sybase ASE: List all tables in the current database and their size

In order to get a list of all tables in the current database, you can filter the sysobjects table by type = ‘U’ e.g.:

select convert(varchar(30),o.name) AS table_name
from sysobjects o
where type = 'U'
order by table_name

In order to get the number of rows of each table, you can use the row_count function. It takes two arguments:

  • the database ID – you can get the ID of the current database using the db_id function
  • the object ID – it’s the id column in sysobjects

e.g.:

select convert(varchar(30),o.name) AS table_name,
row_count(db_id(), o.id) AS row_count
from sysobjects o
where type = 'U'
order by table_name

And in order to get some size information you can use the data_pages function. It will return the number of pages and you can then multiply it by the number of kilobyte per page e.g.:

select convert(varchar(30),o.name) AS table_name,
row_count(db_id(), o.id) AS row_count,
data_pages(db_id(), o.id, 0) AS pages,
data_pages(db_id(), o.id, 0) * (@@maxpagesize/1024) AS kbs
from sysobjects o
where type = 'U'
order by table_name

The first column returned by this statement contains the table name (if you have names longer than 30 characters, you should replace 30 by something higher), the number of rows, the number of data pages, the size in kilobytes.

If you have an ASE version older than 15, the statement above will not work but you can use the statement below instead:

select sysobjects.name,
Pages = sum(data_pgs(sysindexes.id, ioampg)),
Kbs = sum(data_pgs(sysindexes.id, ioampg)) * (@@maxpagesize/1024)
from sysindexes, sysobjects
where sysindexes.id = sysobjects.id
    and sysindexes.id > 100
    and (indid > 1)
group by sysobjects.name
order by sysobjects.name

This will return the table name, number of pages and size in kilobytes.

Choosing an open source NoSQL database

I’m currently evaluating which NoSQL we could use in a new project and thought I’d document the considered options are and the relevant criteria here.

First let’s see what a NoSQL database is at all. NoSQL doesn’t necessarily stand for “No SQL” but rather for “Not only SQL”. So this means it’s a database which can be worked with without using SQL but it doesn’t mean that none of them actually supports an SQL syntax. But the goal is of course not to get rid of SQL but rather to support use cases which are currently not well supported by classical relational database management system (e.g. Oracle, Sybase ASE, MS SQL Server, MySQL…).

Shortcomings of relational databases

So what are the main shortcomings of relational databases:

  1. Effort to setup and maintain.
  2. Scalability.
  3. Performance.

Effort to setup and maintain

In a relational database, all entities stored in the database must be defined with a schema known in advance. The relationships between entities have to be modeled. Whenever you need to be able to store new types of data or whenever you need to store additional attributes, you need to update your schema and make sure that existing data are made compatible to the new schema.

So the effort to setup comes from the fact that to park a car in there, you need to disassemble the car to be able to store it in the garage. This means that the whole structure of the data stored in the database needs to be known in order to create a schema which is required in order to store data. The only way to be able to store different types of data not known in advance is to store them as BLOBs which makes it impossible to use any of the advantages of relational databases later on.

The effort to maintain comes from the fact that if you want to start parking trucks additionally to cars, you need to figure out what are the parts making up a truck and what’s common between cars and trucks before you can store trucks.

Scalability

The main scalability issue with relational databases is that in order to keep data integrity and support transactions, a relational database needs to handle transactions and synchronize writes across multiple related entities which if stored on different servers requires much more effort especially to handle deadlocks. Making sure that all data related to a single entity on a single machine becomes increasingly complex. The synchronization costs tend to quickly increase as the complexity of stored data increase.

A big issue regarding scalability arises from the fact that relational databases usually require much more expensive hardware in order to scale. Scaling a relational database with commodity hardware becomes a very difficult task because of the need to support a global lock manager and distributed synchronized writes.

So basically relational database scale very well on a single server but the problem arise when you need to scale it beyond the single server deployment.

Performance

Since the car was split into individual parts in order to park it in the garage, retrieving the car from the garage means reassembling it from its parts. This is where the performance issue of relational databases comes from. Whenever you need to retrieve an entity and related data, it becomes less efficient if they are stored separately. If you stored them all together, you’d be able to retrieve the whole car much faster.

Since a relational database also needs to make sure that the integrity of the model is maintained and ensure atomicity when storing the different parts of the car, storing the car in the garage is slower than if you just stored the whole car at once.

Both points above are of course also related to the scalability issue since the cost to store and retrieve is increased even more when the different parts are physically stored on different machines.

Different types of NoSQL databases

In order to overcome the shortcoming of relational databases in some scenario, different types of NoSQL database came to life. There are basically 4 big groups of NoSQL databases:

  1. Column-oriented databases.
  2. Key-Value stores.
  3. Graph databases.
  4. Document databases.

In the sections below, I’ve listed the characteristics of the different NoSQL database and database products in each category. I’ve only considered products matching the following criteria:

  1. License: Open Source
  2. Supports disk storage
  3. Deployable on Linux, Mac OS X and Windows
  4. Deployable on an own server

The OS requirement is important for me since the final deployment will be on a Linux server (or servers) but development will be done on Mac and Windows. It is also important that the database software doesn’t put unnecessary restrictions on the operating system we’ll use in development and deploy it on in the end.

We also plan to deploy the solution on our own servers so databases which can only be used in combination with a specific cloud offering are not considered.

Column-oriented databases

A column-oriented databases stores data tables as a set of columns rather than a set of rows. They are mostly used for data warehouses and CRM systems where it’s important to be able to aggregate data over large numbers of similar data items.

Cassandra by the Apache Software Foundation seems to be the only database product in this category which seems to match the above criteria. It is released under the Apache license.

Key-Value stores

They data are stored by the application in a schema-less way. It’s value is associated to a key which uniquely identifies it.

Unfortunately, although there are many key-value stores on the market, I couldn’t find a single one matching all the criteria above. If you happen to know of such a database please let me know in the comments and I’ll update this post.

Graph databases

They are able to store elements interconnected with an undetermined number of relations between them. They are mostly appropriate for modelling social relationships, maps and transportation.

I could find two databases which match the above criteria.

The first one is Neo4j by Neo Technology. It is released under the GPL. One disadvantage of Neo4j is about scalability. It doesn’t seem to be part of the main design of Neo4j though Neo4j Enterprise seems to have some support for replication allowing performing a online backup. Without Neo4j you will need to shutdown the database and copy the database files. Neo4j also seems to lack partitioning support.

The second one is OrientDB by Orient Technologies. It is released under the Apache license.

Both products support ACID transactions. Only OrientDB supports partitioning. OrientDB supports replication out of the box. Also the license of OrientDB is more developer friendly. So if you do need a graph database but do not have time to evaluate both products, I’d recommend checking OrientDB.

Document databases

Document databases (also called document stores) store documents encoding data using e.g. XML, YAML, and JSON/BSON (or also as PDF or Microsoft Office files). The documents are stored as collections of documents. These collections are similar to tables in relational databases (the documents being the records in those tables) but unlike relational databases, the document in these collections do not need to have the same schema. They can actually have completely different fields.

Each document in the database has a unique key used to identify it. But unlike key-value stores, document stores also provide functionality to retrieve documents based on their contents (even though all documents do not have the same attributes/fields).

There are quite a few document databases matching my above requirements.

MongoDB by 10gen

It is released under the AGPL. It’s the most well-known document database on the market. It is used by Craigslist, Foursquare and Shutterfly.
It comes with a lot of functionality:

  • predefined datatypes
  • indexes
  • JavaScript server-side scripting
  • partitioning
  • master-slave replication
  • MapReduce
  • eventual and immediate consistency
  • atomic operations within one document

MongoDB supports ad-hoc queries pretty well and its query tools support a lot of what can be done in SQL (of course with the exception of joins). So if you have experience working with an SQL relational database, you should be able to get used to it pretty quickly.

CouchDB by the Apache Software Foundation

It is released (as expected) under the Apache license. It’s a document store inspired by Lotus Notes. It is used by quite a few organizations no big names like MongoDB or Couchbase.

Compared to MongoDB, it does support a few more operating systems (e.g. Android and BSD). But it does not support the following:

  • predefined datatypes
  • immediate consistency

But it does support the following which is not supported by MongoDB:

  • triggers
  • master-master replication

CouchDB is a single node solution with peer-to-peer replication technology and is better suited for decentralized systems. So if you do not need immediate consistency and need master-master replication to build a decentralized system, CouchDB might be a better fit than MongoDB.

Couchbase by Couchbase, Inc.

It is a JSON-based document store derived from CouchDB with a Memcached-compatible interface and is released under an Apache license. It is used by many companies including Adidas, Adobe, Aol, BMW, Cisco, Ebay, Intel, Mozilla, Nokia, Vodafone and Zynga.

Compared to MongoDB it doesn’t support deployment on Solaris and also lacks predefined datatypes.

Being based on CouchDB, it also supports triggers and master-master replication but also supports immediate consistency like MongoDB (and which isn’t supported by CouchDB).

Couchbase additionally has a built-in clustering system and can spread data automatically across multiple nodes.

Also since Couchbase provides built-in Memcached-based caching, it is usually better suited for use cases where low latency or high throughput is a requirement.

If easy scalability and high throughput are important to you but you do not want to sacrifice immediate consistency, then Couchbase might be the right solution for you.

But you should keep in mind that Couchbase is not entirely open-source. There are two versions: Community Edition (free but no latest bug fixes) and Enterprise Edition (with additional restrictions). If you plan to use the Enterprise Edition, you should carefully read the license terms.

ArangoDB

ArangoDB is released under an Apache license. It supports both disk and RAM-based storage of JSON data.

Compared to MongoDB, it does not provide support for MapReduce but supports the ArangoDB query language which allows using aggregation, graph queries, grouping, joins, list iteration, results filtering, results projection, sorting and variables. It also supports ACID transactions.

An advantage of ArangoDB is that it supports database models based on graphs, key-values and documents.

Summary

Since our goal is to store multiple XML document types which might have different schemas and need to be able to generate reports based on their contents, our obvious choice is to go for a document oriented database. This leaves us with 4 database products to choose from:

  1. MongoDB
  2. CouchDB
  3. Couchbase
  4. ArangoDB

Since not all non-functional requirements are yet available, we need to try and define sets of possible requirements and for each of them define the corresponding product which would be our favorite.

First, ArangoDB seems to be a very good product from it’s supported functionality and architecture. But it’s quite a new product (initial release in 2011) and doesn’t have the same kind of user community the other 3 products have. Looking at Google Trend, you will also get the following figures for March 2014:

  • MongoDB: 100
  • CouchDB: 8
  • Couchbase: 7
  • ArangoDB: 0

Of course, if you read this article in a few years from now, the situation will most probably be different. Looking into my crystal ball, I’d say that in a year or two from now, ArangoDB will be up a little bit, interest for CouchDB will further move to Couchbase and MongoDB will still be number one but by not as much as now.

Also since scalability is very important, we would rather tend to use Couchbase than CouchDB. Of course we need to further analyze the differences between the Community and Enterprise editions and also check the exact terms of the license for the enterprise edition.

The only two thing speaking against MongoDB seem to be:

  1. The AGPL license which I’ve always found scary.
  2. Scaling with MongoDB seems to be more complex than with Couchbase.

Right now, I am not 100% sure whether to go for Couchbase or MongoDB. Both seem to meet all our requirements and we probably need to give them both a try and see which one is the perfect fit.

PHP and MySQL: SQL injection

First let’s have a look at what SQL injection is about. SQL injection means that an attacker is injecting some pieces of SQL code in a call to a server instead of just sending some text information in order to go around security mechanisms or in order to perform something which shouldn’t be allowed.

Here’s a very simple example. Let’s say you have a very poorly programmed login function which is called with two parameter, a user name and a password. If you take the parameters and built an SQL statement like this:

$query = "SELECT 1 FROM users WHERE user_id='".$user_name."' AND password='".$password"'"

An attacker may send the following:

  • User name: admin' —
  • Password: anything

The generated SQL query would be:

SELECT 1 FROM users WHERE user_id='admin' --' AND password='anything'

The double dash would make the rest of the line a comment and the statement would always return 1 allowing the attacker to login as admin without valid credentials.

An easy fix for this security issue is not to return 1 but to return an MD5 of the password and compare it with the password provided. Unfortunately, it is also trivial to workaround such security fixes. Let’s say you statement now looks like this:

SELECT password FROM user WHERE user_name='xxx'

All the attacker has to do is to use the following username: admin' AND 1=0 UNION SELECT 'known_md5_checksum

But SQL injection is not only used to be able to login without credential. It can be used to perform actions which are not intended to be allowed. This is typically done by using batched queries i.e. closing the first query and having a second query executed which would either return sensitive information or destroying something e.g. using a user name like: admin'; DROP TABLE important_table —

The first statement will be executed normally and the drop table will then be executed additionally. Fortunately, when using PHP and MySQL these kind of batched queries are not supported. The execution will fail since you can only have one statement executed at a time. But it you used PostgreSQL instead of MySQL it would be possible.

Another thing which is often done using SQL injection is getting access to data in the database which should be protected. This is usually done using a kind of UNION injection. The idea behind it is that:

  • the server you are attacking is fetching data using a query and displaying this data in a tabular form
  • you inject a UNION clause to fetch data from another table
  • the data from both table are displayed in the table on the client

Let’s say you have an order table and you can call the server to display all items in a particular order with such a statement:

$statement = "SELECT name, value FROM items WHERE order_id=".$order_id;

If the attacker now sends the following as order_id, he will get a list of all users and their passwords: 1 UNION SELECT name, password FROM users

I do hope the passwords will be at least encrypted but encryption alone is not always enough to protect data. You also need to make sure the encrypted data cannot be accessed that easily.

Of course an attacker will need to know which kind of statements are executed by your software in order to exploit such a security hole. But it might not be as difficult as you think… I’ll post another article about how you can manage to get information about the query being executed later.

Many database engines also provide the functionality to execute commands in the operating system from SQL. The commands are executed using the user running the database engine. It’s sometimes very useful but in an SQL injection scenario it may allow an attacker to not only steal information or damage the database but also the operating system. Fortunately, in our case the xp_cmdshell command used to do this (MSSQL Server and Sybase) does not exist in MySQL.

So what is to be done to protect yourself against SQL injection attacks ?

First if you use MSSQL Server or Sybase: to prevent an attacker from destroying the whole server or prevent him from getting access to any file on the computer, you should disable xp_cmdshell or run the database engine with a user with very limited access rights to the rest of the system.

Now, let’s get back to MySQL and PHP. Here you should use Use prepared statements and parameterized queries using PDO or Mysqli. These statements are sent to the database engine and are parsed independently of anything else. Like this it is not possible to inject SQL code in a parameter.

An example using Mysqli:

$stmt = $db->prepare('SELECT password FROM users WHERE user_name = ?');
$stmt->bind_param('s', $user_name);
$stmt->execute();
$result = $stmt->get_result();
while ($row = $result->fetch_assoc()) {
    // compare the returned value with the MD5 sum of the provided password
}

And using PDO:

$stmt = $pdo->prepare('SELECT password FROM users WHERE user_name = :user_name');
$stmt->execute(array('user_name' => $user_name));
foreach ($stmt as $row) {
    // compare the returned value with the MD5 sum of the provided password
}

Note that PDO is an extension but it is bundled by default since PHP 5.1 and a MySQL driver is also available by default.

This is basically the best way to secure your software. In case you cannot use PDO or Mysqli, there are other techniques to prevent SQL injection attacks. I’ll list them in an update to this post in a few days.

Sybase ASE: Left outer join

Sybase ASE supports both the old syntax and the newer SQL-92 syntax for left outer joins:

Old syntax:

SELECT * FROM table1, table2 WHERE table1.key*=table2.fkey

New syntax:

SELECT * FROM table1 LEFT JOIN table2 ON table1.key=table2.fkey

As long as you do not have other criteria, the results will be the same. But you might experience some differing results as soon as you add some other criteria e.g. the two following statements seem to do the same but do deliver different results:

1> select top 10 p.p_key, e.e_uid, e.field1 from table_p p, table_e e where p.p_key*=e.p_key and e.field1='V1'
2> go
 p_key       e_uid            field1
 ----------- ---------------- ----------------
           2 2005092612595815 V1
           2 2005030715593204 V1
           2 2005092614251692 V1
           4 NULL             NULL
           8 NULL             NULL
           9 NULL             NULL
          10 NULL             NULL
          11 NULL             NULL
          14 NULL             NULL
          15 NULL             NULL

(10 rows affected)
1> select top 10 p.p_key, e.e_uid, e.field1 from table_p p left join table_e e on p.p_key=e.p_key where e.field1='V1'
2> go
 p_key       e_uid            field1
 ----------- ---------------- ----------------
     1057606 2006100212531641 V1
     1093639 2006100215370890 V1
     1015380 2006100410065929 V1
     1093639 2006100215370949 V1
     1029807 2006100508354802 V1
     1029807 2006100508402832 V1
     1044378 2006100509331826 V1
     1092232 2006100510385895 V1
     1030314 2006100513585134 V1
     1093947 2006100606211859 V1

(10 rows affected)

The reason is that the database engines when executing the first statement does not only consider p.p_key=e.p_key as join criterion but also e.field1='V1'. So basically the first statement is equivalent to the following SQL-92 statement:

1> select top 10 p.p_key, e.e_uid, e.field1 from table_p p left join table_e e on p.p_key=e.p_key and e.field1='V1'
2> go
 p_key       e_uid            field1
 ----------- ---------------- ----------------
           2 2005092612595815 V1
           2 2005030715593204 V1
           2 2005092614251692 V1
           4 NULL             NULL
           8 NULL             NULL
           9 NULL             NULL
          10 NULL             NULL
          11 NULL             NULL
          14 NULL             NULL
          15 NULL             NULL

(10 rows affected)

Note that the second criterion is not in the where clause but in the on part.

So the old left outer join syntax is compacter but it is ambiguous as it doesn’t properly separate the join criteria and the where criteria. In case of a left outer join it makes a huge difference since the join criteria do not filter the returned rows but the where criteria do.

In most cases, the results you were after are the ones returned by the first and last queries above. But you should avoid the old left outer join syntax and try to use the SQL-92 syntax everywhere. It makes it clearer what you mean with the statement and can save some time searching why you did not get the output you were expecting. But also with the SQL-92 syntax you should carefully think whether you want to add a criterion to the join criteria or to the where clause (and as stated above in most cases when using a left outer join, the criteria on the joined tables should probably go in the join criteria).

Sybase ASE Cookbook

I’ve compiled in this short ebook all posts I’ve ever written regarding Sybase ASE and tried to cluster them a little bit. This ebook is full of information useful when you work daily or once in a while with Sybase ASE. Sybase ASE is a high-performance and scalable database management system but it is also complex and not always easy to understand.

After working for over 10 years with ASE, I’ve gathered a lot of information I share on a daily basis with colleagues and once in a while on my blog at http://benohead.com.

I provide this ebook in the hope to be helpful. It is thus available for free. Since I’m not a professional writer, I do not have a crew of people proof-reading it. So spelling might be not as good as it should be and I can’t be sure that everything in there is 100% accurate. If you find any mistake, please contact me at henri.benoit@gmail.com and I’ll do my best to update it.

You can download the cookbook here.

Sybase ASE: check contents of the procedure cache

In order to peek into the procedure cache, you can use the following dbcc command:

dbcc procbuf

In order to see the output on the console, use:

dbcc traceon(3604)
go
dbcc procbuf
go

You’ll see that the output is pretty extensive. If what you are after is which trigger and procedures are using space in the procedure cache and how much space it uses, you only are interested in the lines like:

...
    Total # of bytes used               : 1266320
...
pbname='sp_aux_getsize'   pbprocnum=1
...

You can thus execute it and grep for these two lines:

$SYBASE/OCS/bin/isql -Usa -Pxxxx << EOT | grep "pbname
Total # of bytes used"
dbcc traceon(3604)
go
dbcc procbuf
go
EOT

You of course need to replace xxxx by your actual password.

Then you will want to make it looks nicer:

  • Merge the two lines: awk ‘!(NR%2){print p” “$0}{p=$0}’
  • Display only the name and the size: awk ‘{ print $1″ “$9″ bytes”; }’ | sed “s/pbname=//g” | sed “s/’//g”
  • Sort by size: sed “s/’//g” | sort -k2 -n

Putting it all together:

$SYBASE/OCS/bin/isql -Usa -Pxxxx << EOT | grep "pbname
Total # of bytes used" | awk '!(NR%2){print p" "$0}{p=$0}' | awk '{ print $1" "$9" bytes"; }' | sed "s/pbname=//g" | sed "s/'//g" | sort -k2 -n 
dbcc traceon(3604)
go
dbcc procbuf
go
EOT

You will then see something like:

...
sp_jdbc_tables 62880 bytes
sp_getmessage 68668 bytes
sp_aux_getsize 80596 bytes
sp_mda 81433 bytes
sp_mda 81433 bytes
sp_drv_column_default 90144 bytes
sp_dbcc_run_deletehistory 133993 bytes
sp_lock 180467 bytes
sp_helpsegment 181499 bytes
sp_dbcc_run_summaryreport 207470 bytes
sp_modifystats 315854 bytes
sp_autoformat 339825 bytes
sp_spaceused 353572 bytes
sp_jdbc_columns 380403 bytes
sp_do_poolconfig 491584 bytes
sp_configure 823283 bytes

Sybase ASE: Using archive databases

Archive databases are used to access data from a backup file directly without having the restore the database. Let’s say you lost some data in a table but had many other changes to other tables since the last backup. Just loading the last backup is not an option since you’d lose everything since the last backup. Of course, if you work with transaction log dumps, you can reduce the loss of data but very often it’s still too much. Additionally, in some cases you know the data you want to reload have not changed since the last backup (i.e. some kind of master data). So the best solution would be to be able to keep the current database but just reload this one table. Or maybe you do not want to reload a complete table but just copy a few deleted lines back in a table.

That’s exactly what an archive database is for. You cannot dump an archive database. An archive database is just a normal database dump loaded in a special way so that you can access the data without having to do a regular load of the dump which would overwrite everything.

So what do you need in order to mount a database as an archive database. Well, you need two additional databases:

  1. A “scratch database”
  2. An archive database

The “scratch database” is a small database you need to store a system table called sysaltusages. This table maps the database dump files you are loading to the archive database.

The archive database is an additional database you need to store “modified pages”. Modified pages are pages which are created additionally to the pages stored in the dump files. These are e.g. the result of a recovery performed after loading the database dump. So this database is typically much smaller than the dump files you are loading. But it is difficult to tell upfront how big it will be.

So once you have loaded an archive database, the data you see come from these three sources:

  • The loaded dump files
  • The scratch database
  • The archive database

So let’s first create the two database (I assume here you have some devices available to create these databases).

First we need to create the scratch database:

use master
go
create database scratchdb on scratch_data_dev='100M' log on scratch_log_dev='100M'
go

This will create the scratch database and take in online. Then we need to mark this database as a scratch database:

sp_dboption 'scratch', 'scratch database', 'true'
go
use scratch
go
checkpoint
go

Then we need to create the archive database:

use master
go
create archive database archivedb on archive_data_dev='100M' with scratch_database = scratchdb
go

Replace scratch_data_dev, scratch_log_dev and archive_data_dev by the names of the devices you want to create the data and log fragments of the scratch database and the data fragment of the archive database.

Now we’re ready to load the dump. Just do it the way you would load the database to restore it but only load it to the just created archive database e.g.:

load database archivedb from '...'

You can of course also load the transaction logs with:

load transaction archivedb from '...'

Note that while loading the database dump or the transaction log dumps, you might get error message saying that either the transaction log of the scratch database or the modified pages section of the archive database run full e.g.:

There is no more space in the modified pages section for the archive database ‘pdir_archive_db’. Use the ALTER DATABASE command to increase the amount of space available to the database.

Depending on the message you get, you’ll have to add more space for the transaction log of the scratch database or extend the archive database using alter database. Note that ASE usually gives you a chance to do it before aborting. But at some point in time, it will probably abort, so do not take your time 😉

If you do not care about the recovery and have limited storage available for the archive database you can use:

load database archivedb from '...' with norecovery

Loading with norecovery also reduces the time required to load. Also the database is automatically brought online (this also means you cannot load additional transaction logs). The downside is that the database might be inconsistent (from a physical and transactional point of view).

If you did not use the norecovery option, you have to bring the archive database online:

online database archivedb

Once you are done with the archive database and do not need it anymore, you can just drop both databases:

drop database archivedb
drop database scratchdb