Archive for the 'UNIX' Category

25
Mar

Why I love strace

Strace is a tool that should be in a toolbox of every system administrator. Not only that it can help in troubleshooting simple problems (ie. missing libraries in newly created chroot, which ldd mysteriously misses to report) but it also helps in debugging very complex system problems and performance issues.

Recently I experienced a very strange problem with one of the RHEL 3 servers we’ve got. Problem manifested in a very strange way, SSH and su logins hanged, other daemons were also hanging during the startup, only way to reboot or shutdown the server was to physically press the restart/power off button, etc. All this could have been caused by problems on both software and hardware level. First suspicious was bad RAID controller, but after tests this proved to be a mislead. After more tests and brainstorms hardware problems were definitely excluded, so problem has to be on the software side. But what could be the problem?

After few more misleading steps I tried to trace system calls created by su command and found very interesting results.

$ strace -f -s 1024 -o /tmp/su.strace.out su -
[– cut –]
3138 open(”/dev/audit”, O_RDWR) = 3
3138 fcntl64(3, F_GETFD) = 0
3138 fcntl64(3, F_SETFD, FD_CLOEXEC) = 0
3138 ioctl(3, 0×801c406f

And this is where the strace output ends and su command hangs. Audit device file is opened (file descriptor 3) and as soon as the first request is dispatched to this device (ioctl system call to file descriptor 3) command freezes. According to this I should just disable audit on the server and the problem will be gone.
As a test, audit daemon was temporarily stopped and I tried to switch to another user and the problem was indeed gone.

After searching for similar problems with audit daemon I found an article in Red Hat knowledge base regarding the exactly same
issue (http://kbase.redhat.com/faq/FAQ_79_6169.shtm).
From the article:

When the free space in the filesystem holding the audit logs is less than 20%, the above notify command will error out and auditd will enter suspend mode. This causes all system calls to block.

So this behavior is not a bug but actual feature of the software. :o) From security point of view this is expected behaviour - attacker could fill up filesystem where audit logs are stored before the attack and audit will be disabled, meaning no logs of his activity, so better not to allow ANY activity on the system if audit is not able to write to its logs. But still, this kind of behaviour renders the system completely useless to legitimate users.

The topic of this post is not audit, so I will stop here. Important thing is that strace led us directly to the main source of the problem. Resolution of issues like this would be much more complex and time consuming without this great little tool. :)

05
Sep

Easy way to read MBR?

10$ question. Sometime ago you have created backup of your systems Master Boot Record (MBR). Now, after some change, you noticed you did a fatal mistake and your partition table is corrupted and you need to recover it from the backup you created, but you are not sure if it is the correct version. The question is, what is the easiest way to read partition table from the backup of your MBR? No, Hex editor is not the easiest way to do it (and it is bad for your eyes :)). I wonder how many of you said ‘file’ command? Yes, magical file command is able to read the data from the mbr dump and prints you the actual partition table. Here is an example from my laptop.

# file mbr.bin
mbr.bin: x86 boot sector;
partition 1: ID=0×83, active, starthead 1, startsector 63, 40949622 sectors;
partition 2: ID=0×82, starthead 254, startsector 40949685, 2088450 sectors;
partition 3: ID=0×8e, starthead 254, startsector 43038135, 74172105 sectors, code offset 0×48

As you can see I have only 3 partitions on the disk. First one has type 0×83, which is HEX id for ext3 type of partition and it is my / partition (you don’t see it here, but I know it :)). It is also active partition, it means that it is used for booting the system. You can also see the size of the partition in sectors. Knowing that one sector has length of 512 bytes you can easily find out the size of the partition.

# echo $(((40949622/2)/1024))
19994
# df -k /
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 19833488 6504676 12305072 35% /

That’s it, correct. :)

Next partition is 0×82 which is swap partition. And last partition is 0×8e which is id for Linux LVM partition.

While I am here, I could also explain what is MBR and how it is used.

MBRMain Boot Record resides in first 512 bytes of your bootable disk. Besides partition table it also holds bootloader and something called a magic number. As you can see on the picture, bootloader takes the biggest part of MBR, whole 446 bytes. During the boot process BIOS search for a bootable devices attached to your system and once it finds it it looks at the MBR and loads the bootloader, also called primary bootloader. Primary bootloader looks at the partition table inside MBR (next 64 bytes after the bootloader) and searches for an active partition. When it finds active partition it loads the secondary boot loader from that partitions boot record which, in turn, loads the kernel, and so on.

Magic number is used for sanity check of your MBR. It holds only 2 bytes and should be 0xAA55.

So, in short words, MBR is used to easily locate and load kernel from the correct device. (It is also used by your operating system to find the layout of the disk, but that is another story.)

PS: You can create a dump of your MBR by issuing next command:

# dd if=/dev/sda of=mbr.bin bs=512 count=1

Replace /dev/sda with the correct address to your disk.

PS 2: Sorry for bad quality of the MBR scheme, but I didn’t have much time to work on it and I am not a graphic designer. :D

26
Jul

HP-UX UNIX95 Compatibility

HP-UX is well known for the ease of patch and product manipulation. These operations are done via software called Software Distributor (SD). Situations where SD fails are very rare but they can be very strange.

One of those weird situations happened to me last week. I downloaded patch bundle from HP site and tried to create a depot. Very simple action - untar the bundle, run the create_depot_hp-ux_11 script and the script and SD will do all the necessary things. But, here comes the weird part - checksum error for all patches in the bundle.

# create_depot_hp-ux_11
DEPOT: /var/depot
BUNDLE: BUNDLE
TITLE: Patch Bundle
UNSHAR: y
PSF: depot.psf
Expanding patch shar files…
x - PHCO_23651.text
x - PHCO_23651.depot [compressed]
ERROR: wc results of PHCO_23651.depot are 7082 23582 522240 should be 7082 18520 522240
x - PHKL_18543.text
x - PHKL_18543.depot [compressed]
ERROR: wc results of PHKL_18543.depot are 146386 592281 20377600 should be 146386 524212 20377600

I checked the checksum of the bundle itself and it seemed perfectly fine. What a puzzle, a?

Here is the story. HP-UX was supposed to be compatible with UNIX95 specification, but the problem is that, for some reason, this compatibility breaks SD. This compatibility is enforced by environment variable called UNIX95. So if you ever notice problem like this, check first if this variable is active on your server and if that is the case just simply unset it and your SD will be fully functional again.

# set|grep UNIX95
UNIX95=yes
# unset UNIX95
# create_depot_hp-ux_11
DEPOT: /var/depot
BUNDLE: BUNDLE
TITLE: Patch Bundle
UNSHAR: y
PSF: depot.psf
Expanding patch shar files…
x - PHCO_23651.text
x - PHCO_23651.depot [compressed]
x - PHKL_18543.text
x - PHKL_18543.depot [compressed]

Happy patching! :)

12
Jul

AIX 6 ready for download!

Like I previously announced, IBM AIX 6 Beta will be openly available for free download and testing. This time has come and you can start downloading it right now from this page. More info here.

AIX 6 should bring a lot of new stuff especially when it comes to virtualization and high-availability issues. Some new features are ported directly from fault-tolerant systems which should provide even more stable and reliable systems. There will be no official support for Beta testing, but you can ask for help on one of the IBM forums.

Openness of IBM is a pretty new thing. This change in IBM policy is probably influenced by SUN’s opening of Solaris to the community. But even though some changes started, IBM is still far away from OpenSource and from opening code of it’s product to the OpenSource community. And that is a pity because I would really like to see the same usability features on some other UNIX operating systems. Sadly, even Linux is far behind AIX when it comes to usability.

08
Jul

32 * 2 = 16h

Last week I had an interesting assignment, upgrading one AIX 5.2 server from 32bit to 64bit kernel. Process should be pretty straight forward and is very nicely explained in AIX documentation, but as usual, all actions that require application stopping have to be done after working hours - in this case after 9pm. Considering that all changes, system reboot and application start/stop sequence should not take more than 45 minutes this is not a big problem. As many times before, I didn’t count on good ol’ friend of all system administrators - Murphy.

But, let’s start from the start. First thing I did was to check if the server supports 64bit environment and what version of the kernel is currently running.

# bootinfo -y
64
# bootinfo -K
32

So, the hardware on this server is 64bit (as expected) and active kernel is 32bit. Now, let’s stop applications. Only important application on this server is a production Oracle database. We have to stop it before reboot. (Important thing to note at this moment is the version of database, it is old 8.1.7.4 release of Oracle.)

# su - oracle
% sqlplus /nolog
   
SQL*Plus: Release 8.1.7.0.0 - Production on Wed Jul 4 21:01:20 2007
   
(c) Copyright 2000 Oracle Corporation. All rights reserved.
   
SQL> conn / as sysdba
Connected.(
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> exit
Disconnected from Oracle8i Enterprise Edition Release 8.1.7.4.0 - Production
JServer Release 8.1.7.4.0 - Production

In order to be able to execute 64bit binaries we must edit /etc/inittab so the syscall64 kernel extension is loaded during the boot. This is need even with 64bit kernel.

# mkitab “load64bit:2:wait:/etc/methods/cfg64 >/dev/console 2>&1″

The switch to 64bit kernel is done by simply relinking paths to the kernel and libraries, and updating boot image on the boot device. Followed by a reboot. Simple as that.

# ln -sf /usr/lib/boot/unix_64 /unix
# ln -sf /usr/lib/boot/unix_64 /usr/lib/boot/unix
# bosboot -a
# shutdown -Fr

After the reboot, I checked the version of running kernel to see if the change actually took place.

# bootinfo -K
64

Perfect! so simple isn’t it. I just love when things go so smoothly. Now let’s start Oracle.

# su - oracle
% sqlplus /nolog
Could not load program sqlplus:
Symbol resolution failed for sqlplus because:
Symbol pw_post (number 272) is not exported from dependent module /unix.
Symbol pw_wait (number 273) is not exported from dependent module /unix.
Symbol pw_config (number 274) is not exported from dependent module /unix.
Symbol aix_ora_pw_version3_required (number 275) is not exported from dependent module /unix.
Examine .loader section symbols with the ‘dump -Tv’ command.

“Argh, this can’t be happening!” I was thinking, so I tried again. Surprisingly, that didn’t help. After the initial shock, I looked at the message more carefully and tried to figure out what the hell it meant. Kernel doesn’t support necessary Oracle symbols - so maybe the Oracle kernel extension is not loaded, let’s check.

# loadext -r
   
Oracle Kernel Extension Loader for AIX
Copyright (c) 1998,1999 Oracle Corporation
   
sh: /usr/sbin/crash: not found
No Kernel Extension is currently running.

I was on a right trail. But this is strange, Oracle kernel extension is loaded from /etc/inittab during the boot, it SHOULD be loaded. Maybe the inittab got corrupted.

# lsitab -a|grep ora
orapw:2:wait:/etc/loadext -l /etc/pw-syscall

It is there. In the agony I thought maybe syscall64 extension was not loaded so it failed (although it should not matter).

# genkex|grep syscall
4635e70 390 /usr/lib/drivers/syscalls64.ext

It is there. Let’s try to call it manually, maybe it will work now.

# loadext -l /etc/pw-syscall
   
Oracle Kernel Extension Loader for AIX
Copyright (c) 1998,1999 Oracle Corporation
   
Kernel Extension Version: 3
SYS_SINGLELOAD: Exec format error
kmid: 0 (0×0)
path: ‘/etc/pw-syscall’
libpath: ”

Maybe, this extension does not support 64bit environment?

# strings /etc/pw-syscall|head -3
Kernel Extension Version: 3
$Revision: 1.9 $
Supported Oracle Instances: 32-bit & 64-bit

Now I am puzzled even more.

At this point I felt stuck. Reverting back to 32bit kernel was not even an option as this was only one part of the big migration process on this server. But, on the other hand Oracle has to be up and running by morning - this is a very important production server. As I am not an Oracle guru and there was no one from DB team around to ask for advice, I asked Google for help. As many times before, it proved to be wise choice. People already had this problem and solved it by applying small patch for Oracle.

Important thing here is that Oracle version 8 does not support 64bit kernel on AIX. It requires patch number 2896876 in order to do so.

After applying this patch you get a new kernel extension which loads without complaining.

# genkex|grep syscall
466c850 1218 /etc/pw-syscall64
4641ec0 390 /usr/lib/drivers/syscalls64.ext

Now, let’s try to start Oracle.

# su - oracle
% sqlplus /nolog
   
SQL*Plus: Release 8.1.7.0.0 - Production on Thu Jul 5 00:47:45 2007
   
(c) Copyright 2000 Oracle Corporation. All rights reserved.
   
SQL> conn / as sysdba
Connected to an idle instance.
SQL> startup
ORACLE instance started.
   
Total System Global Area  178704276 bytes
Fixed Size                    73620 bytes
Variable Size             135630848 bytes
Database Buffers           41943040 bytes
Redo Buffers                1056768 bytes
Database mounted.
Database opened.
SQL> exit
Disconnected
% ^D

Nice. :) Next thing is to change inittab to load new Oracle kernel extension,

# chitab “orapw:2:wait:/etc/loadext -l /etc/pw-syscall64″

stop oracle and reboot server again to see how it will behave after the reboot. Luckily everything works fine so at 01am I can finally go home. It was about time since I was there for almost 16 hours (hence the subject of the post.) Ah, the pleasures of being a system administrators are flexible working hours, isn’t it? :)

01
Jul

Usability… WTF is that?

One of the very important things when it comes to software development is usability. Software should be user friendly and easy to use. Despite sustained opinion software for system administration should not be an exception. After all, system administrators are still humans (although some people don’t agree with that :). So it was always a mystery to me why some OS developers, or at least developers of user space tools, try to complicate it as much as they can.

Perfect example for this is Veritas Volume Manager. Other UNIX LVM technologies provide very logical and simple to use tools for LVM administration, but seems that VxVM has “the more confusing - the better” philosophy. Perfect example for this is simple activity of checking how much free space is left in Disk Group (Volume Groups are called Disk Groups in Veritas Volume Manager :)).

# vxdg -g rootdg free
GROUP  DISK     DEVICE   TAG    OFFSET   LENGTH  FLAGS
rootdg rootdisk c1t0d0s2 c1t0d0 46595904 96722880 -

Now, all fields are self explanatory, but WTF are Offset and Length?! Well, Offset is the number of the block where free space begins and Length is size of the free space in blocks. I agree this is very informative and useful output, but why naming fields like this? Why not use simple names like for example “Used space” and “Free space”? Hm, beats me.

But fun doesn’t end there. In case you don’t have free space in your Disk Group vxdg command will not inform you about that, it will just output the header and exit. Very user friendly, isn’t it? :)

Don’t get me wrong, I am not saying VxVM is a bad piece of software. I think it is very powerful and with features that many other Volume Manager software lacks. But, people at Symantec could really hire some usability expert to work on VxVM, it would be a challenge of a lifetime. :)