[svlug] dma_intr <-- Strange DMA hard drive error with Via chipset

James Sparenberg james at linuxrebel.us
Fri Dec 17 12:30:46 PST 2004


On Fri, 2004-12-17 at 02:34, Marat BN wrote:
> Dudes,
> 
> We're getting a serious problem with a Via chipset
> system running dual hard 
> drives with Linux Volume Manager in DMA mode.  Time to
> time, we're getting 
> the following errors dumped onto the console and into
> kernel log:
> 
> *********************
> Nov 30 14:49:38 localhost kernel: hda: dma_intr:
> status=0x51 { DriveReady 
> SeekComplete Error }
> Nov 30 14:49:38 localhost kernel: hda: dma_intr:
> error=0x84 { 
> DriveStatusError BadCRC }
> Nov 30 14:49:38 localhost kernel: hda: dma_intr:
> status=0x51 { DriveReady 
> SeekComplete Error }
> Nov 30 14:49:38 localhost kernel: hda: dma_intr:
> error=0x04 { 
> DriveStatusError }
> Nov 30 14:51:55 localhost smartd[1174]: Device:
> /dev/hda, ATA error count 
> increased from 594 to 596
> ***************
> 
> It appears there is a function in the kernel called
> dma_intr which prints 
> out these error messages above.  But what do you think
> these error messages 
> mean?  The system also tends to crash frequently. 
> Check out the following:
> 
> *************************
> Nov 30 15:40:49 localhost kernel: hda: dma_intr:
> status=0x51 { DriveReady 
> SeekComplete Error }
> Nov 30 15:40:49 localhost kernel: hda: dma_intr:
> error=0x84 { 
> DriveStatusError BadCRC }
> Nov 30 15:40:49 localhost kernel: hda: dma_intr:
> status=0x51 { DriveReady 
> SeekComplete Error }
> Nov 30 15:40:49 localhost kernel: hda: dma_intr:
> error=0x04 { 
> DriveStatusError }
> Nov 30 15:56:22 localhost sshd(pam_unix)[4343]:
> session opened for user 
> guard by (uid=500)
> Nov 30 15:57:27 localhost sshd(pam_unix)[4343]:
> session closed for user 
> guard
> Nov 30 15:57:37 localhost sshd(pam_unix)[4415]:
> session opened for user 
> guard by (uid=500)
> Nov 30 15:59:53 localhost su(pam_unix)[4593]: session
> opened for user root 
> by guard(uid=500)
> Nov 30 16:01:03 localhost su(pam_unix)[4593]: session
> closed for user root
> Nov 30 16:01:23 localhost kernel: Unable to handle
> kernel paging request at 
> virtual address 89868286
> Nov 30 16:01:23 localhost kernel:  printing eip:
> Nov 30 16:01:23 localhost kernel: c0135e87
> Nov 30 16:01:23 localhost kernel: *pde = 00000000
> Nov 30 16:01:23 localhost kernel: Oops: 0000
> Nov 30 16:01:23 localhost kernel: CPU:    0
> Nov 30 16:01:23 localhost kernel: EIP:   
> 0060:[<c0135e87>]    Not tainted
> Nov 30 16:01:23 localhost kernel: EFLAGS: 00010093
> Nov 30 16:01:23 localhost kernel:
> Nov 30 16:01:23 localhost kernel: EIP is at 
> (2.4.20-6crusoe)
> Nov 30 16:01:23 localhost kernel: eax: 89868286   ebx:
> 00000102   ecx: 
> c1a4ef18   edx: c1a4ef28
> Nov 30 16:01:23 localhost kernel: esi: 00000008   edi:
> 00000000   ebp: 
> c1a4ef84   esp: cb1bfe2c
> Nov 30 16:01:23 localhost kernel: ds: 0068   es: 0068 
>  ss: 0068
> Nov 30 16:01:24 localhost kernel: Process AGM (pid:
> 4837, 
> stackpage=cb1bf000)
> Nov 30 16:01:24 localhost kernel: Stack: c030e458
> 00000004 c1a4ef18 00000000 
> 00000007 00000007 00002202 000001d2
> Nov 30 16:01:24 localhost kernel:        000001d2
> 00000000 c0138c9e c1a4c12c 
> 000001d2 cb1be000 00000001 c01392e1
> Nov 30 16:01:24 localhost kernel:        000001d2
> c027efcc 00000300 c013a87f 
> c027efc0 00000000 00000001 00000001
> Nov 30 16:01:24 localhost kernel: Call Trace:  
> [<c0138c9e>]  (0xcb1bfe54))
> Nov 30 16:01:24 localhost kernel: [<c01392e1>] 
> (0xcb1bfe68))
> Nov 30 16:01:24 localhost kernel: [<c013a87f>] 
> (0xcb1bfe78))
> Nov 30 16:01:24 localhost kernel: [<c012ca68>] 
> (0xcb1bfeb4))
> Nov 30 16:01:24 localhost kernel: [<c012cf71>] 
> (0xcb1bfed8))
> Nov 30 16:01:24 localhost kernel: [<c0115bf2>] 
> (0xcb1bff08))
> Nov 30 16:01:24 localhost kernel: [<c012ee20>] 
> (0xcb1bff2c))
> Nov 30 16:01:24 localhost kernel: [<c0123b3f>] 
> (0xcb1bff4c))
> Nov 30 16:01:24 localhost kernel: [<c011fd3c>] 
> (0xcb1bff78))
> Nov 30 16:01:24 localhost kernel: [<c011fc5d>] 
> (0xcb1bff7c))
> Nov 30 16:01:24 localhost kernel: [<c010a92f>] 
> (0xcb1bffa0))
> Nov 30 16:01:24 localhost kernel: [<c0115a70>] 
> (0xcb1bffb0))
> Nov 30 16:01:24 localhost kernel: [<c0109368>] 
> (0xcb1bffb8))
> Nov 30 16:01:24 localhost kernel:
> Nov 30 16:01:24 localhost kernel:
> Nov 30 16:01:24 localhost kernel: Code: 8b 00 43 39 d0
> 75 f9 8b 44 24 08 89 
> da 8b 78 24 8b 40 44 89
> **********************************
> We suspect that the problem with the hard drive DMA
> access corrupted the 
> kernel and executables, causing them to crash. 
> However, we checked the 
> md5sums of the kernel and several executables (even
> though executables 
> should not cause a total system crash like above), and
> found the md5 sums to 
> perfectly match corresponding md5 sums on
> sister-systems.  As you can see, 
> this log was made with kernel 2.4.20-6, but I got a
> similar crash with 
> 2.4.26.  Anybody have any ideas as to what may be
> causing the crash?
> 
> The error messages "dma_intr" go away if we turn off
> the DMA with 
> "hdparm -d0 /dev/hdX"; however, the system may still
> crash even if we turn 
> the DMA off.
> 
> The "dma_intr' seems to take place only on our systems
> with Via chipset.  If 
> we boot the drives on a different system with a
> different chipset, the 
> "dma_intr" message does not appear.
> 
> Thanks a lot for your time, and hope some of you might
> have some pointers on 
> what's going on here.
> 
> Marat 
> 
> 
Threw your first error into google.   Got back a number of related
hits.  (Here's the tiny url to the page) http://tinyurl.com/3n65p)  

James

> 
> 		
> __________________________________ 
> Do you Yahoo!? 
> Yahoo! Mail - Helps protect you from nasty viruses. 
> http://promotions.yahoo.com/new_mail
> 
> _______________________________________________
> svlug mailing list
> svlug at lists.svlug.org
> http://lists.svlug.org/lists/listinfo/svlug





More information about the svlug mailing list