sudo what??: linux

Showing posts with label linux. Show all posts

Tuesday, February 8, 2011

So much for P67/H67 urges to impulse buy

I had been waiting a while for Sandy Bridge to come out for several months. My current PC is a 4 year old Core 2 Duo and runs pretty well with the tweaks I’ve made over the years, despite the seemingly broken output stage of the integrated sound card. Seems like ASUS screwed something up there.

I’ve been looking at Sandy Bridge and set my sights on the i5-2500k, but could never find a motherboard that gave me exactly what I wanted. I wanted something that would serve me well in my desktop for 2-3 years and then work well as a HTPC. I wanted something in a Micro ATX form factor so that it could be passed down to my HTPC in a few years. I don’t have any burning desire for SLI graphics or anything needing more then the expansion slots provided on most mATX boards anyways. I also wanted something with the a good EFI BIOS, and ASUS seems to be the only ones really doing that right. As for networking, I would really like the Intel Gigabit Ethernet PHY rather then the Realtek stuff. Then add in the standard things that are hard not to get such as USB 3.0 and 4-6 SATA ports. Oh, and it would be nice if when this board was demoted to HTPC duties it had embedded graphics, but that’s impossible, for now because I’d have to get a H67 and I wanted a P67 due to overclocking capabilities.

Then last week happened and Intel announced their $700 million USD recall. Pretty big oops on a chipset that wasn’t even innovative or pushing any real new features. Now, I’m stuck waiting for the Z68 chip set at minimum, which will promise to deliver on the embedded graphics and overclocking abilities, which is a win for the desktop use and a win for the HTPC use as well. Hopefully someone will throw on 6 SATA ports and Intel PHY in a mATX form factor and I’m sold. Most likely the only person who will do this will be Intel, and that’s fine, but I hope they spice up their BIOS a bit for Z68.

I guess it all boils down to a board with this:

The Z68 chipset
Micro ATX form factor
Intel Gigabit LAN PHY
Decent EFI BIOS
USB 3.0 and 6 SATA ports

And so I wait... I’m begging someone to make something that fills my needs and takes my money.

Thursday, January 20, 2011

Major Linux Speed Up

A few years ago I maxed out the RAM on my Intel Core 2 Duo system at 8GB. RAM was super cheap for the time and why not? Reality is that most of that RAM doesn't get used for much unless I'm running several virtual machines with large RAM allocations.

However, this past weekend I saw a post about using a RAM disk to speed-up your web browser. Okay, cool idea, I read their post and it seemed over complicated. Instead I figured I could do better, so It logged out of my Gnome desktop and logged in to a virtual terminal and added the following my /etc/fstab:

tmpfs /home/user/.cache tmpfs size=1G 0 0

Followed by mount /home/user/.cache and the so far the speed-up has been huge. I've been itching to replace my 4 year old Core 2 Duo with a new Sandy Bridge setup, but this may let me hold out for a while longer at least until the Intel Z68 chipset comes out or even as long as the Intel Ivy Bridge debut.

What that simple line does is creates a 1GB tmpfs, aka RAM file system, for everything in the cache folder. Consequently Chromium keeps it's cache there as does Compiz. I look forward to more programs just using the directory and speeding up everything a little bit.

Simple task, huge difference.

Sunday, June 13, 2010

New filesystem and harddrives?

I have been planning on upgrading my main PC which double as my HTPC fileserver at the end of the year when Intel releases Sandy Bridge. A new motherboard, processor, a SSD, and some DDR3 were the original plan. However, in light of my recent harddrive fiasco, my 250GB Seagates may be retired sooner then originally planned. My SMART reallocated sector count (SMART id# 5) is at 7, and I anticipate it growing. Until it becomes an issue I intend to keep using it, and send it out a few months before the warranty expires in 2011.

That said, I've been reconsidering my entire filesystem approach. Originally I had the following setup:

/dev/sd{a,b}1

/boot partition consisting of /dev/sda1 and /dev/sdb1

/dev/sda2

Win7 system partition

/dev/sdb2

Linux root partition

/dev/sda3

Linux RAID0 striping + LVM

Following the harddrive failure, Win7 has been reinstalled on a partition on 1TB harddrive (which is mostly my backup drive for Linux data), and my Linux root partition became the more stable /dev/sda2 partition since sdb is on it's way out.

That said, I've been looking for a way to pool my growing number of old disks in to a backup filesystem (I have a few old 120GB - 300GB PATA drives laying around). I've looked at things like FlexRAID and unRAID, but they don't seem to be really that well thought out and more targeted for Windows HTPC users. ZFS has been an industry buzzword for sometime, but it lacks native kernel level implementation in Linux, and I fear it won't let me add/remove drives on the fly (I don't want to use RAIDZ).

This leads me to btrfs or "butter fs". So far I've gathered that it is seriously lacking in things like man pages, but for the most part it seems to work as a simple file system. I've setup sda2 and sdb2 (my former Linux RAID0 + LVM) to be a btrfs. My home directory (which is backed-up daily) has been running from it and so far it works as just a file system. It has the filesystem metadata mirrored on both drives and the data is stripped. I'm not sure of an effective way to benchmark the filesystem other then just use it, and so I will until my Seagate sdb drive is on its last legs leading me to RMA it and purchase some new Samsung Spinpoint 1TB F3 HD103SJs (which are the current hot ticket and pretty cheap @ $70 shipped from Newegg. At that point, I'll pick something stable and go back to not worrying about it.

Until then btrfs has some other appealing features I'm looking to test out:

Compression
De-duplication
Snapshots
Multi-device file-system (ie the filesystem knows about two drives rather then letting RAID masquerade this.

First off, compression is just that, it seems that it uses zlib and compresses some files on the fly as it writes them resulting increased write/read speeds for plain-text compressible files.

I know I have multiple copies of the same file scattered all over my drive, and de-duplication is an easy way for me to save some space without doing anything. Good deal.

Snapshots would be handy way to protect me from `rm -rf ./dir` followed my "OH SHIT". Although my current nightly rsyncs to a backup drive make me feel plenty safe.

And finally, the most important is the multi-device file-system support. This would enable me to replace md + LVM for my primary storage and it would help me to achieve my goal of pooling old disks for use as a backup. I'm still having a hard time dropping my old way of RAID thinking where all the drive properties have to match.

So, for example, say I have a ghetto RAID setup (which I do for the purposes of testing), like:

/dev/sda3 (190GB)
/dev/sdb3 (190GB) and is maybe dying
/dev/tb1/btrfstest (70GB) lvm on a 1TB WD Black

You could start out by striping data and mirroring metadata like this:

# mkfs.btrfs -L newhome /dev/sda3 /dev/sdb3
# mount /dev/sda3 /home

You then copy your home data over using rsync...
Now, at this point I wanted to remove /dev/sdb3 from the setup, but was shot down with:

# btrfs-vol -r /dev/sdb3 /home
btrfs: unable to go below two devices on raid1

Okay, so lets add that logical volume from my 1TB drive just for kicks and then re-stripe/balance the data for performance.

# btrfs-vol -a /dev/tb1/btrfstest /home
ioctl returns 0
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda3 461G 56G 402G 13% /home
# btrfs-vol -b /home

That easy huh? All without unmounting the filesystem (and infact, all while writing this). Now lets pretend sdb3 starts dying again and we want it out as we tried before:

# btrfs-vol -r /dev/sdb3 /home

Filesystem grinds and after some time it finishes. Running dmesg shows it's working:

btrfs: found 2594 extents
btrfs: found 2594 extents
btrfs: relocating block group 84854964224 flags 9

Easy enough, and impressive to say the least. The only real question now is, if I can add/delete drives to btrfs filesystems on the fly, should I go for data striping and possibly (not confirmed with btrfs method of striping objects) incurring a latency penalty, or just have the data in "single" for the backup volume(s).

Time will tell...

Seagate FAIL

I run two ST3250620AS as my root file system with a Linux software RAID0 setup for my /home directory. These drives are from Seagates 7200.10 series which were the first drives to switch to perpendicular recording some years ago. This was a time Seagate had a 5 year warranty for OEM drives and am immaculate reputation.

Starting on Friday, I heard my harddrive clicking. Some quick investigation by looking at logs revealed that sdb was dying to some degree:

Jun 11 14:21:38 core kernel: ata3.00: exception Emask 0x10 SAct 0x1 SErr 0x810000 action 0xe frozen
Jun 11 14:21:38 core kernel: ata3.00: irq_stat 0x08400000, interface fatal error, PHY RDY changed
Jun 11 14:21:38 core kernel: ata3: SError: { PHYRdyChg LinkSeq }
Jun 11 14:21:38 core kernel: ata3.00: failed command: READ FPDMA QUEUED
Jun 11 14:21:38 core kernel: ata3.00: cmd 60/60:00:7d:8d:25/00:00:10:00:00/40 tag 0 ncq 49152 in
Jun 11 14:21:38 core kernel: res 40/00:00:7d:8d:25/00:00:10:00:00/40 Emask 0x10 (ATA bus error)
Jun 11 14:21:38 core kernel: ata3.00: status: { DRDY }
Jun 11 14:21:38 core kernel: ata3: hard resetting link
Jun 11 14:21:41 core kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Jun 11 14:21:41 core kernel: ata3.00: configured for UDMA/133
Jun 11 14:21:41 core kernel: ata3: EH complete

I noticed the clicking when I took the side of my case off to look at something else, and figured maybe I bumped the cable. I touched the cable and it seemed happy. I wrote it off as a bad cable and replaced the cable later that day when I had a chance to power down the machine. I noticed that one of the contacts was recessed a bit more then the others, so I swapped it and looked at the others. Two others were bad, so I just threw them out and visually inspected the replacements.

Fast forward a few hours and it's acting up again. This time I dig deeper with smartctl and run some tests, the first drive in the array passes without problems, but the other has some serious issues. I downloaded Seagate's Seatools CD and booted off of that since my attempts at running the S.M.A.R.T. long test from Linux failed. Running it from the CD found 2 bad sectors (on top of 7 that were already remapped) and give me the option to repair them, and so far so good. See my smartctl data below. Also note this drive is almost 4 years old but reports a lifetime of only 4718... I think that's an oops on Seagate's part as this drive has been on 24/7 since then.

$ sudo smartctl -a /dev/sdb
Password:
smartctl version 5.38 [x86_64-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.10 family
Device Model:     ST3250620AS
Serial Number:    5QE0DYWW
Firmware Version: 3.AAC
User Capacity:    250,059,350,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is:    Sun Jun 13 10:00:31 2010 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         ( 430) seconds.
Offline data collection
capabilities:             (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     ( 92) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate     0x000f   111   086   006    Pre-fail Always       -       34962761
3 Spin_Up_Time            0x0003   092   089   000    Pre-fail Always       -       0
4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       323
5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail Always       -       7
7 Seek_Error_Rate         0x000f   085   060   030    Pre-fail Always       -       341869071
9 Power_On_Hours          0x0032   095   095   000    Old_age   Always       -       4720
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail Always       -       0
12 Power_Cycle_Count       0x0032   099   099   020    Old_age   Always       -       1031
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       119
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   056   050   045    Old_age   Always       -       44 (Lifetime Min/Max 41/44)
194 Temperature_Celsius     0x0022   044   050   000    Old_age   Always       -       44 (0 14 0 0)
195 Hardware_ECC_Recovered 0x001a   077   053   000    Old_age   Always       -       14538
197 Current_Pending_Sector 0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 TA_Increase_Count       0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 119 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 119 occurred at disk power-on lifetime: 4715 hours (196 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 01 55 1a 5e e0 Error: UNC at LBA = 0x005e1a55 = 6167125

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC   Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
42 00 00 78 13 5e e0 00      01:06:52.649 READ VERIFY SECTOR(S) EXT
42 00 00 78 0b 5e e0 00      01:06:52.631 READ VERIFY SECTOR(S) EXT
42 00 00 78 03 5e e0 00      01:06:52.618 READ VERIFY SECTOR(S) EXT
42 00 00 78 fb 5d e0 00      01:06:52.600 READ VERIFY SECTOR(S) EXT
42 00 00 78 f3 5d e0 00      01:06:52.587 READ VERIFY SECTOR(S) EXT

Error 118 occurred at disk power-on lifetime: 4715 hours (196 days + 11 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 01 77 bb 1b e0 Error: UNC at LBA = 0x001bbb77 = 1817463

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC   Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
42 00 00 00 b8 1b e0 00      00:57:43.347 READ VERIFY SECTOR(S) EXT
42 00 00 00 b0 1b e0 00      00:57:43.334 READ VERIFY SECTOR(S) EXT
42 00 00 00 a8 1b e0 00      00:57:43.317 READ VERIFY SECTOR(S) EXT
42 00 00 00 a0 1b e0 00      00:57:43.304 READ VERIFY SECTOR(S) EXT
42 00 00 00 98 1b e0 00      00:57:43.287 READ VERIFY SECTOR(S) EXT

Error 117 occurred at disk power-on lifetime: 4711 hours (196 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 55 1a 5e ee Error: UNC at LBA = 0x0e5e1a55 = 241048149

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC   Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 50 1a 5e ee 00      05:47:25.395 READ DMA
27 00 00 00 00 00 e0 00      05:47:23.485 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00      05:47:23.427 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00      05:47:23.426 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00      05:47:23.426 READ NATIVE MAX ADDRESS EXT

Error 116 occurred at disk power-on lifetime: 4711 hours (196 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 55 1a 5e ee Error: UNC at LBA = 0x0e5e1a55 = 241048149

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC   Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 50 1a 5e ee 00      05:47:19.397 READ DMA
27 00 00 00 00 00 e0 00      05:47:23.485 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00      05:47:23.427 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00      05:47:23.426 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00      05:47:23.426 READ NATIVE MAX ADDRESS EXT

Error 115 occurred at disk power-on lifetime: 4711 hours (196 days + 7 hours)
When the command that caused the error occurred, the device was active or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 55 1a 5e ee Error: UNC at LBA = 0x0e5e1a55 = 241048149

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC   Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 00 08 50 1a 5e ee 00      05:47:19.397 READ DMA
27 00 00 00 00 00 e0 00      05:47:19.396 READ NATIVE MAX ADDRESS EXT
ec 00 00 00 00 00 a0 00      05:47:19.338 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 00      05:47:19.338 SET FEATURES [Set transfer mode]
27 00 00 00 00 00 e0 00      05:47:17.436 READ NATIVE MAX ADDRESS EXT

SMART Self-test log structure revision number 1
Num Test_Description    Status                  Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline    Completed without error       00%      4718         -
# 2 Short offline       Completed without error       00%      4716         -
# 3 Short offline       Completed: read failure       90%      4716         241048149
# 4 Short offline       Completed: read failure       90%      4714         241048149
# 5 Short offline       Completed: read failure       90%      4712         241048149
# 6 Short offline       Completed: read failure       90%      4712         241048149
# 7 Short offline       Completed: read failure       90%      4710         169589623
# 8 Extended offline    Completed: read failure       90%      4706         169589623
# 9 Extended offline    Completed without error       00%      4400         -
#10 Short offline       Completed without error       00%      4397         -

Wednesday, June 9, 2010

DreamHost and Duplicity Backups

I started on Monday backing up my home computers to DreamHost backup using duplicity.

DreamHost gives their shared hosting users 50GB of backup for personal files. DreamHost offers "unlimited" diskspace on their webservers for hosting files accessed by the web. However, my personal files aren't to be accessed by anyone but me and I just wanted an offsite backup.

I researched backup solutions, initially turning to my tried and true over the past 10+ years rsync scripts. However, I don't trust DreamHost to keep my data secure, so I *need* encryption. I narrowed down the choices to either a TrueCrypt image that I could mount in Linux and then split and rsync, or use duplicity.

I considered TrueCrypt for quite a while as the community following for it is rather impressive. However, syncing a single large image wasn't feasible. Splitting the 30-50GB image in to smaller pieces (guessing 250MB maybe) using the UNIX split command seemed to work with rsync, only transferring the major parts seemed to work on a 1GB test file I modified parts of. However, this just meant that now I needed to always keep 50GBs of space free so I could split the image which in turn was sync'd upstream.

I then resorted to duplicity. Initially, I didn't like idea of using tar (behind the scenes) which was then encrypted using GPG. I'm a long time users of rsync, where if I need one or two files I can instantly access them. However, I can't remember the last time I needed to that, so I bit the bullet and tried it. GPG is more secure as well (in my opinion) then TrueCrypt.

On Monday I gave it a shot, backing up my system's /etc directory and /home for the time being. The --dry-run in duplicity calculated app

rox 31GB of data to be copied... do some math using Wolfram Alpha considering my ~ 700Kb/s upload and we get 4 days 2 hours. Lovely, okay so my system will be hogging my Internet for a few days.

Some more research (via Google not firsthand) seems to suggest that duplicity won't pick-up where it left off either should it get interrupted, one of the features rsync does very well since it only deals with items on a file by file basis. So, I'll let it run.

After that I'll do incremental backups and then do a full backup next month. I also need to research compression algorithms in gpg's by adding "--gpg-options='--compress-algo=bzip2 --bzip2-compress-level=9'" to duplicity's options.

Oh yeah, and I need to make sure I can restore the backup.