Tuesday, December 22, 2009

vSphere PowerCLI Quick Start

vSphere Command-Line Interface vCLI or vSphere PowerCLI can be used to manage ESX/ESXi host.  vCLI is supported on  both Windows and Linux Client; PowerCLI is supported Windows Client only,  but it is more powerful than vCLI.

Windows PowerShell basics

###Windows Powershell supports wildcard
* ? []
###Window Powershell help
help get*
help get-vm
help get-vm -full

###list alias

sort sort-object
ft format-table
fl format-list

###Variable, store result into variable
#$var becomes array, $var[0] is first process, so it can be used in foreach loop for more complex operations
foreach ($proc in $var) {  $proc.ProcessName}
#The output  can be achieved by format-table
get-process | ft name

###sort, ascending is  default order
get-process| sort cpu -descending
###filter, find notepad process
get-process| where-object { $_.name -eq "notepad" }

vSphere PowerCLI basics

###Install in following order
- install Windows  PowerShell on Window XP/Windows 2003/Windows 2008.
- install vSphere PowerCLI

### First time use
#you will receive certificate warning,  type A to accept it for always run.
#after this you will receive the other  warning about signing,  type this command:
Set-ExecutionPolicy RemoteSigned.
#restart PowerCLI to check if warning disappears.

### Login and execute command
Connect-VIServer -server ServerName
##Some useful commands

#list all vms and sort them by Memory in descending order
#The column real name is MemoryMB, but it is displayed as "Memory (MB)"
# So you need to use fl command to find out the realname ;
get-vm vmname | fl
get-vm | sort MemoryMB -descending

#Restart all VMs which are currently Poweredon
#Don't use Restart-VM  because it  is like poweron and poweroff, not #graceful restart
get-vm | where {$_.powerstate -eq "poweredon"} | reset-vmguest

#Read hosts from file
# if you want to exclude some hosts from previous example, you certainly # can add more filter expression but i just want to show how to read file,
#save output to a file
get-vm | where {$_.powerstate -eq "poweredon"}  >d:\temp\host.list

#Edit the file and remove unwanted hosts
#Read the file and restart all hosts in the file, trimend is to remove trailing space
get-content d:\temp\host.list | foreach { $x=$_.trimend() ; reset-vmguest  $x }

PowerShell quick reference
VBScript-to-Windows PowerShell Conversion Guide
VMware: vSphere PowerCLI Blog

Monday, December 14, 2009

NFS/Samba alternative: Mount remote directory over SSH.

It is possible to mount a remote directory as a local file system through SSH by using SSHFS, which is based on FUSE: a library to create filesystems in userspace. There are many other files system based on FUSE .


Secure and easy to setup, only client side needs install SSHFS and FUSE, the server side just needs SSH server with sftp support.


Performance Penalty, because data transferred has to be encrypted and decrypted, which is time consuming and CPU intensive.


Search and install sshfs using package management tool of your Linux flavour.


- Mount any remote dir:

sshfs user@server:dir /mnt/sshfs

Only root and users in fuse group can use this command.

- Unmount sshfs

fuserumount -u /mnt/sshfs


umount /mnt/sshfs (root user only)

Wednesday, November 4, 2009

How to exclude directories with find

To exclude directories with find,grep is the obvious choice  but it is not efficient.Following shows three ways to exclude directories with find

#Sample directories and file, the aim is to exclude directory /tmp/test/test1

$find /tmp/test

(1) Simple most efficient way, but I can only get it working in ksh

$find /tmp/test/!(test1) -type f

(2)With not expression “! “

$find /tmp/test ! -path "/tmp/test/test1*" -type f

(3)With prune, which means exclude preceding path

$find /tmp/test  -path  /tmp/test/test1 -prune  -o -type f -print

Wednesday, October 14, 2009

Zenoss monitor Windows Server 2008 via WMI

Zenoss supports Windows SNMP, it can get partition and interfaces infomation, but it couldn't get CPU/MEMORY info. WMI script can get almost any info in Windows. Zenoss supports Windows WMI by zenpack
The agent account in remote Windows Server doens't need to be admin user as long as following previleges granted.

Enable DCOM
The easy way is to add the user to group "Distributed COM users"

Alternatively, grant specific rights to the user
Start DCOM GUI by DCOMCNFG command-> Component Services -> Computers->Right-click My Computer, and then select Properties->COM Security tab
Give access permission and launch and activation permission.

Enabling Account Privileges in WMI
Computer Management -> Services and Applications-> WMI Control->right click select Properties->Security
Select CIMV2 under root

Select security button add new user with
Enable Account
Remote Enable

Allowing WMI through the Windows Firewall
Allow pre-defined rule: Windows Management Instrumentation (WMI)

Deny ssh interactive login but allow sftp

SSH interactive login need tty to be allocated but sftp/scp doesn't need tty. So you can disable SSH interactive login by no-pty option in OPENSSH. But no-pty option is valid only in public key authentication, so you have to disable password for the user with “passwd –l username” command.

I have attempted to use pam_listfile.so tty option to achieve this, I found it is impossible because pam_tty name ssh will be allocated in either ssh login or sftp.

All you need to is to put no-pty parameter in ~/.ssh/authorized_keys, it must be in the same line with the public key, multiple options are separated by comma e.g

no-pty,no-X11-forwarding ssh-dss AAAAB3Nz ... key-comment

Another useful feature of public key authentication is forced command, which means the command is invoked whenever the key is authenticated, it is great security feature for remote execution e.g backup job. you can also limit client source with "from= " option.

#Force to run command date only
$ cat /home/test/.ssh/authorized_keys
command="date" ssh-dss AAAAB3NzaC1kc3 ..

#date command was executed even given command is ls
$ ssh test@localhost ls
Wed Oct 14 10:29:39 EST 2009

#forced command can literally disable SSH interactive login.
$ ssh test@localhost
Wed Oct 14 10:29:44 EST 2009
Connection to localhost closed.

Tuesday, September 1, 2009

Command to get system hardware serial number.

SMBIOS/DMI standard includes system manufacturer, model name, serial number, BIOS version, asset tag as well as a lot of other details of varying level of interest and reliability depending on the manufacturer. This will often include usage status for the CPU sockets, expansion slots (e.g. AGP, PCI, ISA) and memory module slots, and the list of I/O ports (e.g. serial, parallel, USB).
Solaris X86:

Manufacturer: HP
Product: ProLiant DL360 G3
Serial Number: B038XXXXX

Solaris Sparc:
smbios is not supported in SPARC yet, the traditional command prtdiag works for both X86 and SPARC ,but it reports less detailed hardware information.


Create restricted login account

Create a login in restricted shell and doesn’t allow user to change password.

rsh is a limiting version of the standard command inter-
preter sh, used to restrict logins to execution environments
whose capabilities are more controlled than those of sh (see
sh(1) for complete description and usage).

The actions of rsh are identical to those of sh, except that
the following are disallowed:

changing directory (see cd(1)),

setting the value of $PATH,

pecifying path or command names containing /,

redirecting output (> and >>).

Set restricted shell as login shell

usermod -s /usr/lib/rsh userid
usermod -s /usr/bin/rbash userid

Set minimum number of days between password changes to large number, so user can’t change password until min days

passwd -n 9999 –x 9998 userid
(Solaris needs to set both Min Max days and Min is greater than Max)
passwd –n 9999 userid

Thursday, August 20, 2009

Align partitions on the stripe Boundary for Linux and Windows to boost performance

Aligning partitions on the stripe Boundary can boost IO performance up to 20% depending on file system block size , stripe size and intensity of IO workload etc. Disk alignment issue exists for environment, in which all following factors are met

Disk: Hardware Raid(Including SAN)
Server: X86 32bit or 64bit PC server .
OS: Linux or Windows ( BSD, Solaris not investigated)

- Sector Size: Normally 512 byte as industry standard to lower-format a single harddisk.
- Stripe Size: The smallest unit used by SAN, Hardware Raid and software Raid starting from 2 KB, in power of 2. but 32,64,128 is common stripe size
- Block Size: The smallest amount of disk space which can be allocated to hold a file for file system, ext3,NTFS is 4k by default

Due to x86 architectures BIOS limitation, the first partition starts at 63 sector by default in Windows or Linux.
As a result, The partition doesn't align the Stripe Boundary, so there are chances that one FS block sits above 2 stripes, so one request involves 2 physical IOs. The chances can be calculated as (FS block size/ stripe size).so It is 100% for 4k FS block on 4K stripe size.

The offset should be multiple of stripe size, if you are not sure the stripe size, start at 1M should be safe.
Take 64K stripe size for example:
((Partition offset) * (Disk sector size)) / (Stripe unit size)

(63 * 512) / 65536=0.4921875
(128* 512) / 65536=1
So the partition should start 128 sector (65536 bytes) at least

- Linux: fdisk -lu
$ fdisk -lu
Device Boot Start End Blocks Id System
/dev/sda1 63 37736684 18868311 83 Linux
- Windows:
Any version up to Windows 2003 are affected by default, Windows 2008 has fixed the issue

- Linux:
Fdisk go to expert mode by type x then select b to adjust starting block
- Windows:
diskpart detailed in the Windows KB


Friday, July 24, 2009

Generate random password with shell script.

Generate random passwords
Generate random passwords consist of letters, numbers and any special characters.
$ tr -cd \#_[:alnum:] < /dev/urandom |  fold -w 8 | head -5 


$openssl passwd "$RANDOM" | cut -c1-8

Pick the appropriate password
The above one liner is fine for general purpose, but with password policy, you have to choose one adheres to the password policy

The following script will pick the right password in form of at least 1 upper case, 1 lower case and 1 digit

# Generate random password adhere to password policy
# caveat: if you need more strict policy e.g. 2 upper cases,2 lower cases, 2 digit, adjust the number retuned by head

for i in $(tr -dc [:alnum:] </dev/urandom |  fold -w $LENGTH |  head -20)
UPPERS=$(echo $i |  $AWK '{print gsub(/[A-Z]/,"")}')
LOWERS=$(echo $i |  $AWK '{print gsub(/[a-z]/,"")}')
DIGITS=$(echo $i |  $AWK '{print gsub(/[0-9]/,"")}')
if [ $UPPERS -ge $MIN_U -a $LOWERS -ge $MIN_L -a $DIGITS -ge $MIN_D ];then
FOUND=1; break

if [  -z $FOUND ];then
echo "ERROR: could not generate appropriate password"
echo "Password Generated :" $i

$ ./genpwd.sh
Password Generated : 8sZrR1az

Thursday, June 25, 2009

few scripting tips.

How to get the last digits of a string e.g print 201 for string ua07app201?

#sed back reference: print the first match pattern enclosed by ( )
echo ua07app201 | sed 's/.*[^0-9]\([0-9]*\)$/\1/'

#Sed delete: Delete the longest match of non-digits char
echo 'ua07app201' | sed 's/.*[^0-9]//'

#Expr matching operator : Similar to sed back reference, without (), it returns the the number of matched chars.
expr ua07app201 : '.*[^0-9]\([0-9]*\)$'

#Awk: set non-digit as seperator, print the last filed $NF
echo 'ua07app201' | nawk -F'[^0-9]' '{print $NF}'

#Perl in command line mode
echo ua07app201 | perl -nle ' $_ =~m /(\d+$)/; print $1'
or the simplizied version
echo ua07app201|perl -nle'print/(\d+)$/'

#Parameter Substitution, delete the longest match of non-digits chars from beginning.
a='ua07app201';echo "${a##*[!0-9]}"

How to get path only from full path of a file?

#Parameter Substitution, delete the shortest match from end
$ var=/var/tmp/test.txt;echo ${var%/*}

How to sort a string?

#one-liner to sort a string
$echo "s03 s08 s01" | tr '[:space:]' '\n' | sort -n | paste -s
s01     s03     s08

Monday, June 1, 2009

ctrl+A and ctrl+E key shortcuts can’t move cursor in bash.

ctrl+A and ctrl+E are mostly used key shortcuts to move cursor the start/end of the line, if they stop working, you need to check the command line editor option in bash. There are two command line editors for bash: emacs or vi. The default editor is emacs, that is why ctrl+A and ctrl+E works by default. Don't regard ctrl+A ctrl+E as built-in feature of bash, it can be disabled by changing editor to vi.

set -o #list current value of the options
emacs off
vi on
set -o vi #enable vi as editor, it will disable emacs automatically like set +o emacs does
set -o emacs #enable emacs as editor, it will disable vi automatically like set +o vi does

Command Editing -- Cursor Movement in command mode for vi
l (el) Move the cursor forward one character
h Move the cursor back one character
w Move the cursor forward one word
b Move the cursor back one word
fc Find character c in the line
0 (zero) Cursor to the start of the line
$ Cursor to the end of the line
l, h, w, f and b can be preceded by a number, thus 2b moves the cursor back two words, and 3fx finds the third occurrence of x in the line. In emacs and gmacs mode, cursor movement is different.

Command Editing -- Cursor Movement in emacs/gmacs
CONTROL-F Move the cursor forward one character
CONTROL-B Move the cursor back one character
ESCAPE then f Move the cursor forward one word
ESCAPE then b Move the cursor back one word
CONTROL-A Cursor to the start of the line
CONTROL-E Cursor to the end of the line


Lastly, set -o options work for ksh as well, but csh/sh doesn't support it.

Thursday, May 28, 2009

Dtrace Basics

- D Program Structure
probe descriptions
/ predicate /
action statements
-- Probe Descriptions
provider : subsystem : function : probeName
syscall::*lwp*:entry, syscall::*sock*:entry #support wildcards, one or more fields of the probe description are omitted(means any value)
--- list available provider moduel function name
$dtrace -l more
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
4 nfsmapid209 nfsmapid check_domain daemon-domain
5 nfsmapid209 nfsmapid resolv_query_thread thread-domain
6 syscall nosys entry
-- Predicates
Predicates are expressions enclosed in slashes / / that are evaluated at probe firing time to determine whether the associated actions should be executed.
D language doesn't has control-flow constructs such as if-statements and loops. it use Predicates
-- Actions
Probe actions are described by a list of statements separated by semicolons (;) and enclosed in braces { }.If no additional action need, an empty set of braces with no statements inside
-- example
$dtrace -n syscall::read:entry #

-n means to match probename from command line, -m=match module name
$dtrace -s counter.d #- s read input from script
$vi counter.d
i = 10;
/i > 0/
/i == 0/

dtrace/profile are providers
tick-5sec #tick-xsec is the function name of provider profile, like sleep in shell
trace (100) #print out a value or string(needs enclosed by " ",trace ("hello")), like echo in shell
printf ("%s","hello") # print out in particular format
/* ... */ #comment lines
if no
dtrace:::END statement, you need to press ctrl+c to see to the result.

--Use built-in variables for predicates
/pid == 12345/

execname: Name of the current process's executable file
pid:Process ID of the current process
tid: Thread ID of the current thread

- Aggregations
DTrace stores the results of aggregating functions in objects called aggregations. The aggregation results are indexed using a tuple of expressions similar to those used for associative arrays. In D, the syntax for an aggregation is
@name[ keys ] = aggfunc ( args );
Aggregations is used for result data,the entire data set need not be stored
Arrgegations are printed out by default no need to print statement
-- Example
# Syscall count by process,
dtrace -n 'syscall:::entry { @num[pid,execname] = count(); }'

-- DTrace Aggregating Functions
count: The number of times called.
sum: The total value of the specified expressions.

- Structs
If you have programmed in the Java programming language, think of aDstruct as a class, but one with data members only and no methods.
struct callinfo {
uint64_t ts; /* timestamp of last syscall entry */
uint64_t elapsed; /* total elapsed time in nanoseconds */
uint64_t calls; /* number of calls made */
size_t maxbytes; /* maximum byte count argument */
You can use the operator -> to access struct members through a pointer. callinfo->ts

- Further reading..
Solaris Dynamic Tracing Guide - Official Dtrace guide
DTrace Tools - collection of useful scripts

Difference between Red Hat Linux and SUSE Linux

Red hat Linux and SUSE linux don't have too much difference, the universal management tool: YaST makes it easier for Red hat Linux users to grasp SUSE Linux: The following is important difference between Red Hat Linux and SUSE Linux in sysadmin perspective (based on SUSE Linux Enterprise Server 10)

- Management tool
YaST is the central system management tool for SUSE Linux.
YaST is collection of modules,e.g "yast users " will go to user management module directly, yast -l to list all modules.

-- launch YaST:
yast: Start in text mode
yast2:Start in text mode/gui mode depending on the terminal capability

-Installation automation
Red Hat Linux: kickstart SUSE linux: AutoYast

- Boot
-- boot files:
/etc/init.d/boot #master boot control file called by init process
/etc/init.d/boot calls various scripts in /etc/init.d/boot.d/* to load modules/mount FS/activate network etc.
Files in /etc/init.d/boot.d/Sxxboot.* are sym link to /etc/init.d/boot.*, e.g. (/etc/init.d/boot.d/S08boot.localfs -> ../boot.localfs).
They are executed in ascending order.
Finally, /etc/init.d/boot call /etc/init.d/boot.local (like /etc/rc.local in RedHat Linux for user defined startup command )

-- Service control
Both can use chkconfig tool to control service, in addition. SuSE linux has rcSVCNAME command linked to the start script to control service directly . e.g
#file /usr/sbin/rcsshd
/usr/sbin/rcsshd: symbolic link to `/etc/init.d/sshd'

- User
New user added to users group by default, RedHat Linux added to new private group
For bash, local profile use ~/.profile, RedHat Linux use ~/.bash_profile

- Network
/etc/sysconfig/network/* #all network related config files
/etc/sysconfig/network-scripts/ifcfg-eth-id-MA-CA-DD-RE-SS #interface config files
/etc/HOSTNAME #set hostname,the file name is upper case.
/etc/sysconfig/network/routes #static routes

- Security
Red Hat Linux: SElinux
SuSE Linux: AppArmor
Both are based on Linux Security Modules (LSM) to overcome the limitation of traditional Discretionary Access Control (DAC), but the configuration details are totally different. The default configuration may cause weird issue, AppArmor can be turned with script /etc/init.d/boot.apparmor

Detailed document for Migrating from RedHat to SUSE Linux Enterprise Server 10

Tuesday, May 19, 2009

Netapp Notes

#== Basics
- Introduction
Data ONTAP is the name of Netapp's Platform OS, it is based on BSD.
Netapp appliance is based on i86 PC hardware(mostly AMD Opteron nowadays).
NetApp appliance is unified storage supports file-based protocols such as NFS, CIFS, FTP, TFTP, and HTTP, block-based protocols such as FC and iSCSI.
NetApp V-Series can attach and manage third party storage systems.
NVRAM: log transitions, it can replay the log in the event of unplanned shutdown.
RAM: system memory, data read/write cache
Flash card(not all models): system initial boot media

FC Controller port 0a/0b/0c/0d ....
Jumper to setup Disk shelf ID 1 - 7
Disk numbering from right to left 16 ......
WWNN - world wide name
is a World Wide Name assigned to a port in a Fibre Channel fabric, it performs a function equivalent to the MAC address in Ethernet protocol
WWPN - world wilde port name
is valid for the same WWNN to be seen on many different ports (different addresses)

Initiator= client, target = server
Software initiator with standard NIC
TCP offload Engine TOE with soft initiator – offload computing from CPU
ISCSI HBA -Hardware initiator and provide diskless boot
No need to use ISCSI HBA, if diskless boot feature is not needed and CPU resource is plenty

Netapp use software ISCSI if no ISCSI HBA present.

# setup iscsi on linux
[root@linux /]# iscsi-iname
The ID generated is random, write it to:/etc/initiatorname.iscsi to be persistent

- help
help # list all cmds
cifs help shares #display help for sub command
man cmd
priv set advanced # to use advanced command e.g ls
- account managment
useradmin group list
useradmin user add admin -g Administrators
- Access netapp
http://IP/na_admin # web gui URL.
ssh IP # ssh (or rsh, telnet in) for CLI access, enable ssh with secureadmin cmd
rsh IP #excute cmd remotely from admin server, the admin server should be added to /etc/hosts.equiv
get root mount of /vol/vol0/ in a unix machine to edit config files
- Read/write file
rdfile /etc/exports
wrfile -a /etc/hosts filer1 #append to file
wrfile /etc/hosts.equiv #Rewrite file, type in lines, then ctr+C
- Error messages
rdfile /etc/messages
- backup config file
config dump config.bak
it is saved to /etc/configs/config.bak
- server setting
options command control server setting
options ftpd.enable on #enable ftp server for example
options nfs.export.auto-update off #turn off auto export, otherwise the new volume will be exported automatically
all options are saved to /etc/registry
- system stats
stats show
sysstat -su 1
- copy entire volume
vol copy
ndmpd on
ndmpcopy -f /vol/vol0 /vol/vol0_trad
- Cluster
Netapp cluster doesn't do I/O load balancing, it is just for fail-over purpose.You need to allocate disk in each node for different services

#= =Boot
Ctrl+C to go to boot menu.
there is option to reset password
type in "22/7" to show secret boot menu

#= =Storage

Qtree, and/or subdirectories, export-able
Volume (TradVol, FlexVol), export-able, snapshot configured at this level.
agregate (OnTap 7.0 and up)
plex (relevant mostly in mirror conf)
raid group

disk zero spare # zero all spare disk so they can be added quickly to a volume
Data ONTAP supports 100 aggregates (including traditional volumes) on a single storage system.
Data ONTAP supports 500 volumes per head(FAS2020 and FAS200 series, the limit is 200 FlexVol volumes.), So in cluster enviroment, the combined volumes number in both nodes should not exceed the limit for the sake of failover.
Netapp wafl(Write Anywhere File layout) block size = 4 KB.
Netapp support Raid0, Raid4, RaidDP (double-parity), Raid 1 (via snapmirror)

- Traditional Volume
It is tightly coupled swith its containing aggregate. No other volumes can get their storage from this containing aggregate. It can'b be shrinked. can be expanded by adding more disks

- FlexVol volumes
A FlexVol volume is a volume that is loosely coupled to its containing aggregate. A FlexVol volume
can share its containing aggregate with other FlexVol volumes,Thus, a single aggregate can be the
shared source of all the storage used by all the FlexVol volumes contained by that aggregate.

- flex clone
FlexClone volumes always exist in the same aggregate as their parent volumes
You cannot delete the base Snapshot copy in a parent volume while a FlexClone volume using that
Snapshot copy exists. The base Snapshot copy is the Snapshot copy that was used to create the
FlexClone volume, and is marked busy, vclone in the parent volume.

- snapshot
255 snapshot per volume
A Snapshot copy is a frozen, read-only image of a traditional volume, a FlexVol volume, or an aggregate
that captures the state of the file system at a point in time. It doesn't consume space initially, snapshot grows only as data changes.
- Aggregate
Max space: 16TB Max Number: 100
1 big aggregate runs faster than mulitiple aggregates created on same number of physcial disks
16 disks setup is the sweetspot for space usage utilization and performance
- Lun
Lun is created on top of volume, it can't exceeds the volume size. it is used to for FC/ISCSI mount

#= = command
sysconfig -r # show raid group and spare disks
aggr status -s #only show spare disks
sysconfig -V #show aggregate name and the numer of disk owned
sysconfig -d # show physical disks
storage show disk # show physical disks

- Traditional volume
vol create travol1 3 #create a traditional volume with 3 disks, disk are selected automattically
vol create trad02 -d 0a.24 0a.25 0a.27 #create a traditional volume by specifying disk names

- Flex volume
aggr create aggr2 4 #create aggregate first
vol create flexvol1 aggr2 20M #create volume on the aggregate
vol offline trad02
vol destroy trad02 #delete volume, need to be brought offline first
vol options vol1 nosnap on #turrn off automatic shceduled snapshot, not snapshot ablity
aggr show_space #show aggregate and volume space
aggr options aggr1 raidtype raid_dp #change raidtype between raid4 and raid_dp from raid0 to raid4/raid_dp
df -h #show volume space

- Snapshot
snap create vol0 mysnap0
snap delete vol0 mysnap0
snap list vol0 #list snapshot
snap delta vol0 #show size of changed data
/vol/vol0/.snapshot/mysnap0 #access snapshot data

#= = Network
- show ip/change ip
- add route
route add net 1
- permanet add
wrfile -a /etc/rc route add net 1
- vi /etc/rc
routed on # turn on RIP routing
- Link aggregation
- Package tracing
pktt start ns0
pktt dump ns0 #a xx.trc file will be saved to /
pktt stop ns0

#= =NFS
- turn off nfs auto export for new volume
options nfs.export.auto-update off
- Show exports
exportfs -v
- show detailed export options
exportfs -q /vol/vol0
/vol/vol0 -sec=sys,(ruleid=0),rw,anon=0,nosuid
- Permanent export and add entry to /etc/exports
exportfs -p sec=sys,rw,nosuid /vol/vol1
- Temp export and don't add entry to /etc/exports
exportfs -io sec=sys,rw,nosuid /vol/vol1
- permanent unexport, remove from /etc/exports
exportfs -z path
- Temp export
exportfs -u path
- Re-read /etc/exports and re-export
export -r
- Control access
exportfs -io sec=sys,rw= /vol/vol1
exportfs enable nosave /vol/vol1

- Stop/start service
cifs terminate / cifs restart
- Initial setup
cifs setup /* select (3) Windows Workgroup authentication using the filer's local user accounts */
- Determine if both nfs client/cifs client access system
options wafl.default_security_style unix ntfs mixed
- display shares
cifs shares
- add share
cifs shares -add HOME /vol/vol0/home
- add permission
cifs access -delete home everyone
cifs access HOME Administrators "Full Control"

Wednesday, April 29, 2009

Install Netapp simulator on Virtualbox

Netapp simulator provides almost the full function of real Netapp filer, but the simulator can only run on Linux. So I installed the simulator on Centos 5.2 within Virtualbox. The installation went well, But I couldn’t ping Netapp from Centos, I almost gave it up after numerous attempt until I found out the Netapp’s “Parent OS” is not supposed to access Netapp by design, (VMWARE doesn’t has the restriction). So I have to access Netapp from my host OS(Windows XP).

Netapp supports two Virutalbox network type: Host network and internal network. After setup Netapp, don’t bother to ping Netapp from its “Parent OS”. Just try to access from your host OS(Host Network type) or another instance of Guest OS (internal network type, network name must be the same)

Tuesday, April 28, 2009

Add Unix user to Windows AD by Vbscript

Windows AD has become a popular choice for managing Unix accounts.Windows is known for its fantastic GUI, but it doesn’t mean it lacks scripting ability. This note shows how to add Unix user to Windows 2003 AD by vbscript.
NOTE: The unix attribute is msSFU30UidNumber ... in my Server, you can doublecheck your value by browsing ldap path:LDAP://CN=" & strUnixDomain  & ",CN=ypservers,CN=YPSERV30,CN=RpcServices,CN=System," &strDomain

#==Usage Example
D:\>cscript add-user.vbs John Smith
Created: John Smith Username=John.Smith Password=3a5RurD4

#==Script Content
'UPN format: firstname.lastname@yourdomain.com.au
'Create new user in  ou=Developers,dc=yourdomain,dc=com,dc=au
'Generate a random password and set it for the new user
'Set a free UnixUID based on msSFU30MaxUidNumber
'Set the pre-defined strUnixGid
'But no new Windows group membership assigned, it still belongs to domain users by default
'Author: http://honglus.blogspot.com 


strUnixShell ="/bin/bash"

strDomain = "dc=yourdomain,dc=com,dc=au" 
strParentDN = "ou=Developers," & strDomain

' ------ END CONFIGURATION ---------

if  (WScript.Arguments.Count <> 2 ) then 
wscript.echo "*ERROR* Expected minimum input: 2,   Given:"&  WScript.Arguments.Count
wscript.echo "- USAGE: PROGRAM Firstname LastName"
wscript.echo "- EXAMPLE: PROGRAM John " &"""Enclose Space""" 
End if

strLogin=WScript.Arguments.item(1) &"." & Script.Arguments.item(2)


strFullname = strFirstName & " " & strLastName
strUnixHome ="/home/"&strLogin
strUserpn = strLogin & strDomainUPN

set objParent = GetObject("LDAP://" & strParentDN)
Set objUser = objParent.Create("user", "cn=" & strFullname)
objUser.Put "sAMAccountName", strLogin
objUser.Put "UserPrincipalName", struserpn
objUser.Put "givenName", strFirstName
objUser.Put "sn", strLastName
objUser.Put "displayName", strFullName
objUser.Put "msSFU30NisDomain", strUnixDomain
objUser.Put "msSFU30UidNumber", strUnixUid
objUser.Put "msSFU30LoginShell", strUnixShell
objUser.Put "msSFU30HomeDirectory", strUnixHome
objUser.Put "msSFU30GidNumber", strUnixGid
objUser.Put "userAccountControl", ADS_UF_DONT_EXPIRE_PASSWD
'   objUser.Put "userAccountControl", ADS_UF_NORMAL_ACCOUNT  
WScript.Echo "Created: " & strFirstName& " "  strLastName &" Username=" &strlogin & " Password="  & strrndPass

' Generate random password 

Function RndPassword(vLength)

' Always include a-z,A-Z,0-9
strPass3=strPass3& chr(Int((122 - 97 + 1) * Rnd + 97))   
strPass3=strPass3& chr(Int((90 - 65 + 1) * Rnd + 65))    
strPass3=strPass3& chr(Int((57 - 48 + 1) * Rnd + 48))   

'Skip the 3 char already created
For x=4 To vLength

intIndex=Int((3 - 1 + 1) * Rnd + 1) '[1-3]

select case intIndex
case 1
strPass = chr(Int((122 - 97 + 1) * Rnd + 97))    '[A-Z]
case 2
strPass=chr(Int((90 - 65 + 1) * Rnd + 65))  '[a-z]
case 3
strPass=chr(Int((57 - 48 + 1) * Rnd + 48)) '[0-9]
case Else
strPass=chr(Int((57 - 48 + 1) * Rnd + 48)) '[0-9]
end select
RndPassword = RndPassword & strPass

RndPassword = RndPassword &strPass3

End Function

function getMaxUid

strquery="LDAP://CN=" & strUnixDomain  & ",CN=ypservers,CN=YPSERV30,CN=RpcServices,CN=System," &strDomain
set ypdomain=getobject(StrQuery)


' wscript.echo "The current free Max UID=" &uidmax


'Increase Maxuid by 1


End function

Monday, April 27, 2009

Changing Linux user's password with script

It is time consuming to change user’s password for many hosts, The expect language can be used to change password without typing password,the chpasswd tool in Linux is easier to use. chpasswd is from pwdutils RPM, it should be available to all Linux distributions.

#== Create the script to change password.
$vi chpwd.sh

echo "root:newpasswd" | /usr/sbin/chpasswd 

$chmod +rx chpwd.sh

#==Copy the script the remote-host and execute it
sudo is used because root ssh login is disabled, for unknown reason the temp file couldn't be deleted with sudo, so it is emptied instead

$scp -p chpwd.sh remote-host:/tmp/chpwd.sh
$ssh remote-host sudo '/tmp/chpwd.sh;cat /dev/null>/tmp/chpwd.sh;cat /tmp/chpwd.sh'

Zenoss monitor customized application via SNMP

Zenoss can monitor remote customized applications by various methods e.g SSH/NRPE/SNMP, this note demonstrates SNMP method
if your targent host doesn't support SSH/NRPE mentioned in last two posts, SNMP is a good option. Even your app doesn't have built-in SNMP OID, net-snmp allows you to map an OID to your app. The solution is not perfect, the drawback is that the alarm will be triggered whenever the app fails but the detailed error message given by the app is not available.
#== ENV
Zenoss 2.3.3 + Centos 5.2 + Net-snmp 5.3.1
#== Setup SNMP
Please make sure you have basic snmp working, refer to my post
Set up Net-snmp on CentOS
The OID definition to be used is: /usr/share/snmp/mibs/UCD-SNMP-MIB.txt
Its OID range is ., some values have been used, let's start with.
There are specifications about the OID, the ID of interest is 2021.ID.100.

2021.ID.1 : an integer index value. In scalers, this is always
of value 1. In tables it is a row index.
2021.ID.2 : a name of the script, process, etc. that this row represents.
2021.ID.100 : An error flag indicating if an error is present on
that row (a threshold value was crossed, etc).
2021.ID.101 : An error string describing why the error flag is non-0

#==== Create a test script

$vi /usr/local/bin/check_test.sh
if [ $flag -eq 0 ]; then
echo "SNMP check test -OK"
exit 0
echo "SNMP check test -FAILED"
exit 1

$chmod +rx /usr/local/bin/check_test.sh
#==== Map a OID to the script
$vi /etc/snmp/snmp.conf
exec . check_test /usr/local/bin/check_test.sh
#====exec the script by run query to the OID
$snmpwalk -v2c -c public localhost .
UCD-SNMP-MIB::ucdavis.200.1.1 = INTEGER: 1
UCD-SNMP-MIB::ucdavis.200.2.1 = STRING: "check_test"
UCD-SNMP-MIB::ucdavis.200.3.1 = STRING: "/usr/local/bin/check_test.sh"
UCD-SNMP-MIB::ucdavis.200.100.1 = INTEGER: 1
UCD-SNMP-MIB::ucdavis.200.101.1 = STRING: "SNMP check test -FAILED"
UCD-SNMP-MIB::ucdavis.200.102.1 = INTEGER: 0
UCD-SNMP-MIB::ucdavis.200.103.1 = ""
#==Create template under Devices/Server to use the script
(You can create template under any scope e.g Devices/Server/linux)
Classes->Devices->Server(sub-Devices)Templates->Add Template (add template is hidden drop down menu brought up by clicking the small triangle button)

New data Source ( ID: userdefined TYPE: SNMP)
New data Point
name: check_test_SNMP
type: GAUGE
New Thresholds
name: check_test_SNMP
Datapoint: check_test_SNMP_check_test_SNMP
min value: 0
max value: 0
Event Class: /perf/snmp (can be anyting)
Severity: error
Enabled: true

#==Bind the template to your Device
Device List->yourdevice->Open->
Click the small triangle button->More->Template
Click the small triangle button->Bind Templates->add new template to selection (You can select multiple templates)

#== Test, you should be able to see the new datasource was picked up by Zenoss
$/opt/zenoss/zenoss/bin/zenperfsnmp run -d -v10
DEBUG:zen.thresholds:Updating threshold ('check_test_SNMP', ('', ''))

Thursday, April 9, 2009

Zenoss monitor customized application via NRPE

Zenoss doesn't have native NRPE plugin like OpenNMS, But Zenoss has the ability to run customized application though the ZenCommand process, ZenCommand can run any command locally and remotely by using a native SSH transport. When run, Zenoss tracks the return code( 0 =success, !0= fail)
Zenoss can monitor customized applications by various methods e.g SSH/NRPE/SNMP, this note demonstrates NRPE method

#==Install NRPE

Follow instructions here, until you can run remote command via NRPE
sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -H remote-host -c check_test

#==Create template under Devices/Server to use the script
(You can create template under any scope e.g Devices/Server/linux)
Classes->Devices->Server(sub-Devices)Templates->Add Template (add template is hidden drop down menu brought up by clicking the small triangle button)
New data Source ( ID: userdefined–NRPE TYPE: command)

use ssh :false
Event Class:/cmd/fail
Command Template: /usr/lib/nagios/plugins/check_nrpe -H ${here/manageIp} -c check_test

#==Bind the template to your Device
Device List->yourdevice->Open->
Click the small triangle button->More->Template
Click the small triangle button->Bind Templates->add new template to selection (You can select multiple templates)
/opt/zenoss/zenoss/bin/zencommand run -d -v10
The output show you zencommand found the new script and executed it.
Back to GUI, the alarm should appear in event log of the device.

Wednesday, April 8, 2009

Zenoss monitor remote customized application via SSH

I have used 3 open source NMS apps, nagios,openNMS, Zenoss. Zenos is the best i have found so far. Nagios doesn’t support SNMP and no graphing ability. OpenNMS supports SNMP and graphing but it needs restart for new config to be effective.
Zenoss doesn’t have native NRPE plugin like OpenNMS, But Zenoss has the ability to run customized application though the ZenCommand process, ZenCommand can run any command locally and remotely by using a native SSH transport. When run, Zenoss tracks the return code( 0 =success, !0= fail)
Zenoss can monitor customized applications by various methods e.g SSH/NRPE/SNMP, this note demonstrates SSH method

#==On the remote host, create a test script
$vi /usr/local/bin/check_test.sh

if [ $flag -eq 0 ]; then
echo "check test-OK"
exit 0
echo "check test-FAILED"
exit 1

$chmod +rx /usr/local/bin/check_test.sh

#==On zenoss
#====Test ssh remote command manually with password authentication
ssh zenoss@remote-ip /usr/local/bin/check_test.sh

#====set ssh username/password on Zenoss

zCommandUsername =zenoss
if you prefer to use ssh key, only enter zCommandUsername ( it appears only dsa type key works)
It is global setting for all devices, it apply to any devices.

#====Create template to use the script

Classes--Devices--Templates--Add Template (add template is hidden drop down menu brought up by clicking the small triangle button)
New data Source ( ID: userdefined TYPE: command)

use ssh :true
Event Class:/cmd/fail
Command Template:/usr/local/bin/check_test.sh
#====Bind the template to your Device
Device List--yourdevice--Open--
Click the small triangle button--More--Template
Click the small triangle button--Bind Templates--add new template to selection (You can select multiple templates)
/opt/zenoss/zenoss/bin/zencommand run -d -v10

INFO:zen.zencommand:---------- - schedule has 1 commands
DEBUG:zen.zencommand:Next command in 299.977344 seconds
DEBUG:zen.SshClient: host key: 66:d3:e2:09:45:80:36:0d:16:77:0a:db:7a:9d:4a:e6
DEBUG:zen.SshClient:creating new SSH connection...
DEBUG:zen.SshClient:Attempting to authenticate using username: zenoss
DEBUG:zen.SshClient:Getting SSH public key from ~/.ssh/id_dsa
DEBUG:zen.SshClient:Expanded key path from ~/.ssh/id_dsa to /home/zenoss/.ssh/id_dsa
DEBUG:zen.SshClient:Getting SSH private key from ~/.ssh/id_dsa
DEBUG:zen.SshClient:Expanded key path from ~/.ssh/id_dsa to /home/zenoss/.ssh/id_dsa
INFO:zen.SshClient:Connected to device
DEBUG:zen.SshClient:started the channel
DEBUG:zen.SshClient:opening command channel for /usr/local/bin/check_test.sh
DEBUG:zen.SshClient:running command remotely: exec /usr/local/bin/check_test.sh
DEBUG:zen.SshClient:command /usr/local/bin/check_test.sh data: 'check test-FAILED\n'
DEBUG:zen.zencommand:Process check_test.sh stopped (1), 1.614611 elapsed
DEBUG:zen.zencommand:The result of "/usr/local/bin/check_test.sh" was "check test-FAILED

As output shows, it connected to remote host via ssh then run the command /usr/local/bin/check_test.sh
Back to GUI, the alarm should appear in event log of the device.

Tuesday, March 31, 2009

Solaris/Linux: find port number for a program and vice-versa

#== Find port number for a program
- lsof tool(Platform independent)

$lsof -nc | sshd grep TCP

sshd 1962 root 3u IPv6 6137 TCP *:ssh (LISTEN)
sshd 2104 root 3u IPv6 7425 TCP> (ESTABLISHED
- Linux
$netstat -anp |grep sshd

tcp 0 0 :::22 :::* LISTEN 1962/sshd
- Solaris
$ pfiles 16976
sockname: AF_INET port: 22

#==Find program name for port number
- lsof tool(Platform independent)
$lsof -i TCP:22
sshd 1962 root 3u IPv6 6137 TCP *:ssh (LISTEN)
sshd 2104 root 3u IPv6 7425 TCP> (ESTABLISHED)
- Linux
$netstat -anp grep 22
tcp 0 0 :::22 :::* LISTEN 1962/sshd
- Solaris
list open files for all process,then search the file for "port: 22"

$ ps -e -o pid | xargs pfiles > /tmp/pfiles.log 


Quiick SElinux notes for the impatient, read full document at


Selinux has 2 levels access control:
1) File context, Daemon can only access file with particular file context
2) Boolean Value: enable/disalbe a feature
for example: By default SElinux does not allow users to login and read their home directories, turn it on by "setsebool -P ftp_home_dir 1"

#==Confined and Unconfined Process
Confined process enter paritcular domain after started, only particular domain has access to particular TYPE files
SElinux has no effect for Unconfined Processes (apps doen's support SElinux)

$ ls -Z /usr/sbin/httpd
-rwxr-xr-x root root system_u:object_r:httpd_exec_t /usr/sbin/httpd #httpd is confined by default
$chcon -Rt unconfined_exec_t /usr/sbin/httpd #change httpd to unconfied_exec_t, it will enter unconfied domain, so it can access any file as long as OS level file permission allowed
$ restorecon -Rv /usr/sbin/httpd #restore default type

#== SELinux: File context
for example: system_u:object_r:httpd_sys_content_t :s0:c0
Not all systems will display s0:c0

# ls -aZ /var/www/html/
drwxr-xr-x root root system_u:object_r:httpd_sys_content_t .
drwxr-xr-x root root system_u:object_r:httpd_sys_content_t ..
# ls -aZd /home
drwxr-xr-x root root system_u:object_r:home_root_t /home
httpd_exec_t can access httpd_sys_content_t not home_root_t

#==SElinux managment
SELINUX=permissive #in /etc/selinux/config. if it changed from disabled . it needs reboot to lable files
getenforce or sestatus #get current status
setenforce 0 # set to permissive mode
setenforce 1 #set to enforce mode
getsebool -a #list booleans and its value , no desc
setsebool httpd_can_network_connect_db on #change current boolean
setsebool -P httpd_can_network_connect_db on #change permanent boolean with -P

- Temparary change context
chcon -R -t httpd_sys_content_t /web/ #change context type dir/file
# it will survive reboot, but not relabel. To relabel, touch /.autorelabel reboot

- Persistent Changes: semanage fcontext
/etc/selinux/targeted/contexts/files/file_contexts #saved to orginal context
/etc/selinux/targeted/contexts/files/ file_contexts.local #saved to new user context
semanage fcontext -a -t samba_share_t /etc/file1 #-a add new context, the file doesn't need to exist.
restorecon -Rv /etc/file1 #read the new customized context and apply it

- Restore default context
semanage fcontext -d /etc/file1 #remove context,the file doesn't need to exist
restorecon -RFv /etc/file1 #apply the change, -F is needed you to restore from customized to default.

/var/log/audit/audit.log #enable auditd daemon first
chkconfig --levels 345 setroubleshoot on #enable troubleshoot daemon
sealert -a /var/log/messages #analyse log
sealert -l \* #show all alert
grep "SELinux is preventing" /var/log/messages
grep "denied" /var/log/audit/audit.log
Port Numbers # services are allowed to run on some defined ports
/usr/sbin/semanage port -l grep http_port_t
ttp_port_t tcp 80, 443, 488, 8008, 8009, 8443
semanage port -a -t http_port_t -p tcp 9876 #add the new port to allowed range

#==== document
selinux-policy-2.4.6-137.el5#man pages for ftpd_selinux, samba_selinux ...etc

Friday, March 27, 2009

OpenNMS monitor disk space usage by SNMP

This post demonstrates two ways to monitor disk space usage by 2 different SNMP MIBS
1) .iso.org.dod.internet.private.enterprises.ucdavis.dskTable
2) .iso.org.dod.internet.mgmt.mib-2.host.hrStorage

What is the diffrence? Option #1 requires disk path to be hardcoded in snmpd.conf at target system, But Option #2 can monitor all partions by default, even there is need to monitor specific partions, the filter is set on OpenNMS, not target system, So Option #2 is more flexible

The alarm is triggered by threshold in SNMP, so you don't need to setup monitors to trigger alarm. Firstly, Please make sure you have basic snmp working, refer to my post Set up Net-snmp on CentOS

OpenNMS 1.6.2 + Centos 5.2 +net-snmp 5.3.1

#==(1) Monitor disk space usage by dskTable MIB
add the parttiton to be monitored to snmpd.conf

$vi /etc/snmp/snmpd.conf
disk /opt2

#====Test by snmpwalk first

$snmpwalk -v2c -c public .iso.org.dod.internet.private.enterprises.ucdavis.dskTable
UCD-SNMP-MIB::dskPath.1 = STRING: /opt2 

OpenNMS 1.6.2 has set default threshold, so you don't need to config any thing in openNMS. doublecheck the threshold by
GUI->Admin->Manage Thresholds->netsnmp

#====Sample alarm appeared
High threshold exceeded for SNMP datasource ns-dskPercent on interface, parms: ds="ns-dskPercent" value="100.0" threshold="90.0" trigger="2" rearm="75.0" label="/opt2" ifIndex="2

#== (2) Monitor disk space usage by hrStorage MIB
no need to add disk path to snmpd.conf, but openNMS needs to be customized.

#===Test by snmpwalk first

$snmpwalk -v2c -c public .iso.org.dod.internet.mgmt.mib-2.host.hrStorage

#==== Find systemOID
This OID is the same to all Net-SNMP agent

$snmptranslate .iso.org.dod.internet.private.enterprises.netSnmp.netSnmpEnumerations.netSnmpAgentOIDs -On


#==== Include the SysOID to ./etc/datacollection-config.xml
By default, mib2-host-resources-storage is not included for Net-SNMP
The value in sysoidMask should include your systemOID
For example sysoidMask . includes .

systemDef name="Net-SNMP"
includeGroup mib2-host-resources-storage/includeGroup

#====Include your sysOID to ./etc/threshd-configuration.xml

package name="hrstorage"
filterIPADDR != '' & (nodeSysOID LIKE '.' nodeSysOID LIKE '.' nodeSysOID LIKE '.')/filter

OpenNMS 1.6.2 has set default threshold, doublecheck the threshold by
GUI->Admin->Manage Thresholds->hrStorage

By default, it monitors all partions. You can create filter to monitor specific partions only

#===Sample alarm appeared
High threshold exceeded for SNMP datasource hrStorageUsed / hrStorageSize * 100.0 on interface, parms: ds="hrStorageUsed / hrStorageSize * 100.0" value="94.88754412506304" threshold="90.0" trigger="2" rearm="75.0" label="/opt2" ifIndex="2"

Wednesday, March 25, 2009

Setup net-snmp on Linux (CentOS 5.2)

The default configuration on net-snmp is very secure, it allows public access to system OID only, If you try access any other OID, it give erorr:No Such Object available on this agent at this OID. This article show how to setup a basic net-snmp with access control ability.

$snmpwalk -v 2c localhost -c public system
SNMPv2-MIB::sysDescr.0 = STRING: Linux centos-ks 2.6.18-92.el5 #1 SMP Tue Jun 10 18:49:47 EDT 2008 i686

$ snmpwalk -v 2c localhost -c public interfaces
IF-MIB::interfaces = No Such Object available on this agent at this OID

NET-SNMP version 5.3.1 Centos 5.2

#=== sample /etc/snmpd/snmpd.conf
- It is important to comment out any default statement above, Because access decision is based on first match.

## sec.name source community
com2sec mynetwork public
com2sec mynetwork public

## group.name sec.model sec.name
group MyROGroup v1 mynetwork
group MyROGroup v2c mynetwork

## incl/excl subtree mask
view all included .1

## context sec.model sec.level prefix read write notif
access MyROGroup "" any noauth exact all none none

**Updated:  28 March 2011

The above statements can be simplified as:
rocommunity  public  .1
rocommunity  public  .1

NOTE:rocommunity can't restrict SNMP version, it allows  all versions:v1 and v2c

#== Troubleshooting
- Snmpd still starts despite syntax error, it make troubleshooting difficult, But if you start it with DEBUG it will warn you any errors
/usr/sbin/snmpd -LE 7 -p /var/run/snmpd.pid -a

- By default, SNMPD looks for modules in /usr/share/snmp/mibs, The following command will check the loaded module
snmpd -Dmib_init

- If you don't know the OID of an object, snmptranslate can help, The following demostrate how to find objectname and its OID

$ snmptranslate -Ts  grep interface

$ snmpget -v 1 localhost -c public interfaces.ifTable.ifEntry.ifDescr.2
IF-MIB::ifDescr.2 = STRING: eth0

$ snmptranslate .iso.org.dod.internet.mgmt.mib-2.interfaces.ifTable.ifEntry.ifDescr.2 -On

$ snmpget -v 1 localhost -c public .
IF-MIB::ifDescr.2 = STRING: eth0

Tuesday, March 24, 2009

How to display content of files along with file names?

Sometimes it is useful to display content of files along with file names. egrep or pr can do the trick

$ cat 1.txt

$ cat 2.txt

#==cat can't display file name

$ cat *.txt

#==display all with wildcard filter *

$ egrep \* *.txt

#==The header of pr displays filename, sed is used to chop blank lines

$ pr *.txt sed '/^$/d'
2009-03-25 03:17 1.txt Page 1
2009-03-25 03:17 2.txt Page 1

When some processes stop Solaris Zone from being shutdown

If Solaris Zone takes long time to shutdown, you may need to examine the process with '*' on the state status.

$ svcs -a grep sendmail
*online Mar_09 svc:/network/smtp:sendmail

#==find the process id of the offending process
$svcs -p sendmail
*online Mar_09 svc:/network/smtp:sendmail
Mar_09 309 sendmail
Mar_09 310 sendmail

#==Then kill with kill cmd

#==If it happens quite offen, You may find the following script handy.

getpid () {

/usr/bin/svcs $SVC >/dev/null
if [ $? -ne 0 ];then
return 1
PID=`/usr/bin/svcs -Hp $SVC|tail +2 | awk '{print $2}'| tail -1`
if [ -z "$PID" ];then
return 0

[ -z $SVCNAMES ] && echo "Usage $0 svcname1 [svcname2] .."


getpid $SVCNAME

if [ $PID -lt 1 ];then
echo "No pid found for $SVCNAME"
exit 1

while [ $PID -gt 1 ]
echo "Delay for " $DELAY " secs"
sleep $DELAY;
getpid $SVCNAME
CNT=`expr $CNT + 1 `
if [ $CNT -le 7 ] && [ $PID -gt 1 ];then
echo "Service $SVCNAME is still running after " `expr $CNT \* $DELAY ` "secs, Gracefully kill it: kill $PID"
kill $PID
elif [ $CNT -gt 7 ] && [ $PID -gt 1 ];then
echo "Service $SVCNAME is still running after " `expr $CNT \* $DELAY ` "secs, Forcefully kill it: kill -9 $PID"
kill -9 $PID

sleep 2;
/usr/bin/svcs $SVCNAME | grep disabled

if [ $? -eq 0 ]; then
echo "Service is stopped"
echo "Service is till running, please kill it mannually"
exit 1


Friday, March 20, 2009

Integrating Nagios plugin with OpenNMS

OpenNMS is highly scalable enterprise level management system. I like its features of versatile built-in monitors, auto-discovery and graphing ability. It can also work with ngaios plugin to use any customized monitor.

Install OpenNMS


Setup NRPE on client

yum install nrpe nagios-plugins-nrpe nagios-plugins

#==create a test script
$vi /usr/lib/nagios/plugins/check_test.sh

echo "check test"
exit $STATE_OK

$ make sure nagios user has rx permission for the script.
chmod +rx check_test.sh

#vi /etc/nagios/nrpe.cfg
allowed_hosts=,IP of OpenNMS

#==start nrpe
service nrpe start

#==Now test it mannually, It is important to run the check as user nagios not root

sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -n -H localhost -c check_test
sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -H localhost -c check_test

-n = Do no use SSL,if nrpe doesn't support both mode, it is important to set usessl value in opennms config file.

Setup NRPE on OpenNMS system:

yum install nagios-plugins-nrpe nagios-plugins
#==no configuration needed here, first run a mannual test
sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -n -H remote-host -c check_test
sudo -u nagios /usr/lib/nagios/plugins/check_nrpe -H remote-host -c check_test
-n = Do no use SSL. if nrpe doesn't support both mode, it is important to set usessl value in opennms config file.

OpenNMS configuration

Two configuartion files need to be modified for new added service.
/opt/opennms/etc/capsd-configuration.xml /* service definition for initial scan */
/opt/opennms/etc/ poller-configuration.xml /* service definition for constant polling */

protocol-plugin protocol="NRPE-test" class-name="org.opennms.netmgt.capsd.plugins.NrpePlugin" scan="on"
property key="banner" value="*"
property key="port" value="5666"
property key="timeout" value="3000"
property key="retry" value="2"
property key="usessl" value="true"
property key="command" value="check_test"

- Important Notes:
Set usessl value depending on your nrpe ssl supporting ability
command used for polling, it better to be set to your customized script(or system built-in cmd: _NRPE_CHECK)

/opt/opennms/etc/ poller-configuration.xml

service name="NRPE-test" interval="300000" user-defined="true" status="on"
parameter key="retry" value="3"
parameter key="timeout" value="3000"
parameter key="port" value="5666"
parameter key="command" value="check_test"
parameter key="usessl" value="true"
parameter key="padding" value="2"
parameter key="rrd-repository" value="/opt/opennms/share/rrd/response"
parameter key="ds-name" value="nrpe-test"

- Important Notes:

The name attribute of the service in poller-configuration.xml needs to match the protocol attribute of the protocol-plugin in capsd-configuration.xml.
The ds-name attribute also needs to be unique for each service, or you'll find response time from one service overwriting response time from another.
You'll also need a line to map the new service to a monitor class (see at the end of the file)

monitor service="NRPE-test" class-name="org.opennms.netmgt.poller.monitors.NrpeMonitor"
Restart OpenNMS for the changes to take effect
The new service should be discovered by re-scan.

if the new added service can't be discovered, turn on debug on for discovery process capsd


# Capsd
log4j.category.OpenNMS.Capsd=DEBUG, CAPSD

if the new added service can be discovered, but having issue with polling,turn on debug on for polling process poller

# Pollers
log4j.category.OpenNMS.Poller=DEBUG, POLLERS

Wednesday, March 18, 2009

RHCE TIPS - Sitting the test

This section is easy and the proctor will let you know the result immediately.

- Compulsory Section I: It tests system maintenance, It counts for 80, which is enough for RHCE. So if you have got 80 here, you don’t need to take next non-compulsory question.
- Non-compulsory Section I: It tests system booting issue, It counts for 20. If you didn’t complete compulsory section or just after perfect sore 100, take the non-compulsory question. The proctor will re-image your PC to introduce the booting issue, So you can’t go back, once you have made the decision. You should be safe , once you have mastered all scenarios in
my previous post.

If you have breezed through SECTION I, Don’t be too joyful, the hardest part is here.
It is hard because the time is limited, there are many tasks to complete, if you stuck with one, time is quickly running out. Secondly, no one will verify the result, you have to check by yourself. It is quite tricky, if you misinterpret the requirement, your check method maybe wrong, or service lost function after reboot.

A few tips during the test:
Manage well the time, don’t stuck with one question too long, you don’t need full score to pass RHCE.

Check the result immediately after complete a task. Don’t expect to check everything at last, you can login to remote Linux to verify your result.

#User and file permission:
‘su – username’ to check permissions creating/listing file
curl http://ServerName-or-IPAddress or elinks http://server
curl –x proxyip:port url
#Send mail:
echo "test" mail –s "subject" user
# Send mail with specific sender:
telnet server 25 \n; mail from: user@x \n; rcpt to: user@y \n;data \n; subject: "subject x" \n; "text body "\n; .
#Receive local email:
mail or mail –f mailbox-filename
#Receive remote email:
mutt –f pop://server or mutt –f imap://server
smbclient //ip/share -U username /* Because you may not able to access the share ,even smbclient -L IP –U show the share*/
#test firewall
nc -z IP 1-200 /*scan remote hosts opening ports */
nc -v IP 25 or telnet IP 25 /* check availability of 1 port */

- Security tasks:
RHCE tasks are about restricting access for services, There are many options to achieve the result, It is up to you which one to use. Service’s native support, PAM, tcp-wrapper, iptables. Be careful using iptables, you should use open firewall which means accepting everything except specifically denied. You don’t want your firewall deny the services completed earlier.

- Need help?
Unfortunately, you don’t have internet access during test. You can only rely on the local man pages and documents. So during the preparation of the test, avoid finding the answers from internet straight away, try the local man pages and docs first. For example if you forget the format for ifcfg-ethx.cfg, the syntax is documented here:

- Lastly:
In the last mins, you should reboot your PC, check the services are still running after reboot. As pre-caution, always begin with task with this command chkconfig svcname on

Authenticate Linux Clients with Active Directory

Great ariticle exlplaining Authenticate Linux Clients with Active Directory using three Authentication Strategies

Using LDAP Authentication
Using LDAP and Kerberos
Using Winbind


Tuesday, March 17, 2009

Learned one critical rule of Openldap's slapd.conf format

One critical rule of Openldap's slapd.conf format : no leading space.

Openldap is easy to config, you just need to customize three params suffix,rootdn and rootpw
# /etc/openldap/slapd.conf

database bdb
suffix "dc=example,dc=com"
rootdn "cn=root,dc=example,dc=com"
rootpw {SSHA}Ok/uoTJYELAj346giEh2mdvmiE5etgcg
The above is my initial config, the rootpw is generated by slappasswd

# slappasswd  -s pass123

I started ldap service, it was fine,But when i do do ldapsearch it get "ldap_bind: Invalid credentials (49)" error

# ldapsearch -x -h -D "cn=root,dc=example,dc=com" -w pass123
ldap_bind: Invalid credentials (49)

The rootdn and rootpw are definately correct, but why? Did you notice the space before rootpw? it is the culprit. The same search returned ok after deleted the leading space.

Another common error "ldap_sasl_interactive_bind_s: No such attribute (16)" will appear if you omit -x :simple authentication

# ldapsearch   -h -D "cn=root,dc=example,dc=com" -w pass123
ldap_sasl_interactive_bind_s: No such attribute (16)

Openldap tested is slapd 2.3.27

Sunday, March 15, 2009

RHCE TIPS - Preparation

    Reference book:
    RHCE Red Hat Certified Engineer Linux Study Guide (Exam RH302) 5th edition by Michael Jang.
    if something is not clear in the book, read official Red Hat Enterprise Linux Documentation

    Lab Setup:
    Install CentOS on Virutalbox

    Virtualbox is free opensource virtualization software alternative to Vmware.You need 2 CentOS instances to prepare for RHCE lab, The networking in Virtualbox is very different to Vmware.

    -Virtualbox Networking Type:
    --NAT: your guest OS can access outside network through NAT provided by virtualbox, but your host OS can’t access guest OS
    --Host interface networking: Host and guest can communicate each other, but guest can’t access outside network unless you setup NAT manually on Host OS
    --Internal network: Guest OS can communicate with each other within the SAME network name (something ike VLAN ID), but not Host OS.

    -Centos ServerA network setup
    1*NAT adapter for internet access to do yum.
    1*Host network adapter for your host to ssh to ServerA
    1*Internal Network adapter to communicate with ServerB

    -Centos ServerB network setup
    1* Internal Network adapter to communicate with ServerA (join the SAME network name of ServerA )

    How can ServerB access outside network? Point the default GW to serverA, and turn on ip forwarding on ServerA.
    How can my Host OS access ServerB?
    1. ssh to serverA first then jump from serverA to ServerB
    2. -setup porforwarding or 1 to 1 static mapping in ServerA
    --Forwarding port 200 to ssh of ServerB

    iptables -t nat -A PREROUTING -p tcp -d ServerA-Host-NIC-IP --dport 200 -j DNAT --to-destination ServerB-IP:22 
    --Static 1 to 1 mapping
    Assign secondary ip to serverA’s host Inc then
    iptables -t nat -A PREROUTING -p tcp -d ServerA-SEC-NIC-IP -j DNAT --to-destination ServerB-IP

    Last but least, read through each chapter and practice it LAB, you never know if it works until you really do it! RHCE exam is all about security, hence I suggest jumping to security chapter before reading networking services. Then apply your security knowledge (pam/tcp-wrapper/iptables/selinux) to each network services read later.

    Saturday, March 14, 2009

    RHCE Notes - Troubleshooting booting issue

    booting issue is optional question in section I,The proctor will re-image your PC to introduce booting issue, You will be given rescue CD to fix it.

    It is easy to troubleshoot Linux boot issue, if you break it intentionally at each step, observe the symptom and find the fix.

    #==Linux boot order
    The BIOS ->MBR->Boot Loader->Kernel->/sbin/init->
    /etc/rc.d/rcX.d/ #where X is run level in /etc/inittab
    run script with K then script with S

    #==Linux rescue env
    boot first linux cd then type linux rescue
    linux rescue will try to mount all partions, however if there is error only some partions are mounted, run choot /mnt/sysimage now will lost /dev /proc mounts, here is how to transfer these mounts.
    mount -o bind /dev /mnt/sysimage/dev
    mount -o bind /proc /mnt/sysimage/proc

    Linux rescue env supports both software RAID and LVM. normal LVM commands e.g vgdisplay are not availiable,but it can be accssed by LVM "master" command e.g "lvm vgdisplay"

    #== Grub boot manager
    = go to grub cmd prompt by pressing c at boot menu
    =find root partition, 2 methords
    grub> root
    (hd0,0) Filesystem type is ext2fs, partition type 0x83
    grub> find /grub/stage1
    =list files/dirs in current drive
    cat / #type cat SPACE / TAB, it will list all fies/dir just like ls
    = display contents of the file
    cat /grub/grub.conf
    = now you can boot interactively by type kernel and initrd commands from grub.conf

    #==Restore missed file from RPM
    #cd /tmp
    #rpm2cpio initscripts-7.93.11.EL-1.i386.rpm cpio -icumvd ./etc/inittab
    #rpm2cpio initscripts-7.93.11.EL-1.i386.rpm >init.cpio /* file is ./etc/inittab not /etc/initab

    List contents: cpio -tv

    install file to alternative location the copy the file
    rpm --root-directory /var/tmp/a X.rpm

    #== MBR corrupted.
    MBR has 512 byte in total
    446 Executable code section
    4 Optional Disk signature
    2 Usually nulls
    64 Partition table #if this is overwritten, no way to recover unless you backuped the partion table or re-partion using #exact same layout
    2 MBR signature

    Corrupt MBR intentionaly:dd if=/dev/zero of=/dev/hda bs=446 count=1 #MBR should be at the start whole disk(not partition hda1), it has 512, the first 446 byte is exec code. DON'T overwrite whole 512 byte because it has partion table data.
    ERR: no bootable media found,Missing operating system" or "Operating System Not Found
    boot from cd run "linux rescue", let it mount linux partions automaticlly.
    chroot /mnt/sysimage then grub-install /dev/hda
    boot from cd run "linux rescue", if linux partions failed to mout
    mount mannually. sfdisk -l; e2label find the boot partition
    mkdir /a; mount /dev/hda1 /a; ln -s /usr/sbin/grub /sbin/grub; grub-install --root-direcotry=/a /dev/hda #it is hda not hda1

    #= root (/)was not mounted
    mount couldn't find file system /dev/root
    switchroot mount faild...
    Error 2 mounting none;exec of init ((null)) failed!!!
    kernel /vmlinuz-test ro root=LABEL=/
    /* root=LABEL=/ mout using label, or root=/dev/sda3 mount with direct dev-name */

    #= not loading initrd image
    VFS: Cannot open root device "Label=/1" or unknow-block(0,0)
    Please append a correct "root=" boot option
    Kernel panic: VFS: Unable to mount roof fs on unknow-block(0,0)
    1) Kernel doesn't Support for the file system .compile kernel with FS support NOT as a module
    2) initrd was not loaded. Add initrd=... in grub.conf
    linux rescue, then chroot /mnt/syimage and create initrd file
    mkinitrd /boot/initrd-filename `uname -r` #make initrd file mannually

    #==/sbin/init problem.
    Switching to new root
    kernel panic -not syncing :Attepmted to kill init
    switching to new root
    /bin/sh: ro : no such file or directory
    /* boot to rescue, check /sbin/init. restore from rpm package*/

    #== /etc/inittab not found
    "enter run level" prompt enter s. or at grub menu append s or init=/bin/sh or emergency, then restore initab from source RPM

    Passed RHCE

    I passed RHCE today, I will be writting some tips and notes.

    Here is my score report.

    RHCE requirements: completion of compulsory items (50 points)
    overall section score of 80 or higher
    RHCT requirements: completion of compulsory items (50 points)

    Compulsory Section I score: 50.0
    Non-compulsory Section I score: 50.0
    Overall Section I score: 100

    RHCE requirements: score of 70 or higher on RHCT components (100 points)
    score of 70 or higher on RHCE components (100 points)

    RHCT requirement: score of 70 or higher on RHCT components (100 points)

    RHCT components score: 92.6
    RHCE components score: 86.7

    RHCE Certification: PASS