LPC debugging

LPC debugging

Postby J_R » Fri Jan 02, 2009 7:17 pm

I figured if anyone can answer my question you can. I have vol 1+2 of dump analysis encyclopedia and they are the best books I have seen on the topic so far. Here is my problem. We have a com server with components running as servers(dllhost.exe). Sometimes the components stop working; they just hang when the front end app makes calls to them. When I dump the dllhost process I see 1900+ threads which is the first indicator of a problem. They typically run with 30-33 threads. I took a complete dump so I could use !lpc commands and see what process is talking to what. I found that all of these threads are waiting for a LPC reply. Sometimes they are waiting for more than an hour. The server thread working on the message is in svchost (rpcss.exe) service. Rpcss has more than 2000 threads. It normally has between 15-20. All of these extra threads are processing messages from our com objects and they are waiting on a swmr lock. Again, the wait times for this lock can be more than an hour. Both usermode and kernel !locks commands shows nothing currently locked. Here is an example thread in rpcss, there are more than 1500 of these.


1: kd> !thread 88d233f0 16
THREAD 88d233f0 Cid 02a4.57bc Teb: 7f69f000 Win32Thread: 00000000 WAIT: (Unknown) UserMode Non-Alertable
88d7f2f8 NotificationEvent
Not impersonating
DeviceMap e167de28
Owning Process 8a09ba58 Image: svchost.exe
Attached Process N/A Image: N/A
Wait Start TickCount 202494014 Ticks: 197625 (0:00:51:27.890)
Context Switch Count 10
UserTime 00:00:00.000
KernelTime 00:00:00.000
Win32 Start Address 0x08d7180a
LPC Server thread working on message Id 8d7180a
Start Address kernel32!BaseThreadStartThunk (0x77e617ec)
Stack Init 9969a000 Current 99699c60 Base 9969a000 Limit 99697000 Call 0
Priority 9 BasePriority 8 PriorityDecrement 0
Kernel stack not resident.
ChildEBP RetAddr Args to Child
99699c78 8083d5b1 88d233f0 88d23498 00000001 nt!KiSwapContext+0x26 (FPO: [Uses EBP] [0,0,4])
99699ca4 8083df9e 00000000 00000000 00000000 nt!KiSwapThread+0x2e5 (FPO: [0,7,0])
99699cec 8092ae57 88d7f2f8 00000006 80925e01 nt!KeWaitForSingleObject+0x346 (FPO: [5,13,4])
99699d50 80833bdf 0000a588 00000000 00000000 nt!NtWaitForSingleObject+0x9a (FPO: [SEH])
99699d50 7c8285ec 0000a588 00000000 00000000 nt!KiFastCallEntry+0xfc (FPO: [0,0] TrapFrame @ 99699d64)
2144f7a4 7c827d0b 77e61d1e 0000a588 00000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
2144f7a8 77e61d1e 0000a588 00000000 00000000 ntdll!NtWaitForSingleObject+0xc (FPO: [3,0,0])
2144f818 77e61c8d 0000a588 ffffffff 00000000 kernel32!WaitForSingleObjectEx+0xac (FPO: [SEH])
2144f82c 77cb0db0 0000a588 ffffffff 00e44ff4 kernel32!WaitForSingleObject+0x12 (FPO: [2,0,0])
2144f840 77c97bc5 2144f89c 0119ae3e 00e44ff4 RPCRT4!SWMRLock::LockSharedOrExclusiveOnLastWriter+0x32 (FPO: [4,0,4])
2144f868 77c76c14 2144f89c 00000001 00000002 RPCRT4!SWMRLock::LockSharedOrExclusive+0xa6 (FPO: [3,1,4])
2144f890 77c76d89 00000000 00000000 00000010 RPCRT4!NDRSContextUnmarshall2+0x1b9 (FPO: [5,2,4])
2144f8bc 77c76c89 4144f938 7fd82e2a 2144f938 RPCRT4!NdrServerContextNewUnmarshall+0xa6 (FPO: [2,2,0])
2144f8d0 77c80752 2144f938 2144fae4 7fd82e2a RPCRT4!NdrUnmarshallHandle+0x52 (FPO: [4,0,4])
2144f900 77ce332f 2144fae4 00095d58 00000000 RPCRT4!NdrpServerUnMarshal+0x13d (FPO: [1,3,4])
2144fcf8 77ce35c4 00000000 00000000 00e12fec RPCRT4!NdrStubCall2+0x19f (FPO: [SEH])
2144fd14 77c7ff7a 00e12fec 00095d58 00e12fec RPCRT4!NdrServerCall2+0x19 (FPO: [1,1,0])
2144fd48 77c8042d 7fd8218d 00e12fec 2144fdec RPCRT4!DispatchToStubInCNoAvrf+0x38 (FPO: [SEH])
2144fd9c 77c80353 00000004 00000000 7fdd1580 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x11f (FPO: [4,13,4])
2144fdc0 77c811dc 00e12fec 00000000 7fdd1580 RPCRT4!RPC_INTERFACE::DispatchToStub+0xa3 (FPO: [4,0,4])
2144fdfc 77c812f0 00135e00 000904d8 015dd6c0 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0x42c (FPO: [0,6,4])
2144fe20 77c88678 00090510 2144fe38 00135e00 RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0x127 (FPO: [4,4,4])
2144ff84 77c88792 2144ffac 77c8872d 000904d8 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x430 (FPO: [0,14,0])
2144ff8c 77c8872d 000904d8 00000000 00000000 RPCRT4!RecvLotsaCallsWrapper+0xd (FPO: [1,0,0])
2144ffac 77c7b110 00084850 2144ffec 77e64829 RPCRT4!BaseCachedThreadRoutine+0x9d (FPO: [1,2,4])
2144ffb8 77e64829 00e44f78 00000000 00000000 RPCRT4!ThreadStartRoutine+0x1b (FPO: [1,0,0])
2144ffec 00000000 77c7b0f5 00e44f78 00000000 kernel32!BaseThreadStart+0x34 (FPO: [SEH])

so, what are the suggested next steps to determine what has the lock that this function is trying to get?
2144f840 77c97bc5 2144f89c 0119ae3e 00e44ff4 RPCRT4!SWMRLock::LockSharedOrExclusiveOnLastWriter+0x32 (FPO: [4,0,4])


The stacks are the same aside from the parameters. This function is always called from rpcrt4 marshalling functions. It seems like whatever thread had the lock is gone and the rest of them don't know it so they wait forever. For now we monitor the threads in rpcss and when it begins to climb we restart it. rpcss can run for 5 minutes or 5 days before they begin to climb. It happens in chunks of threads too. For example rpcss will run fine, then something happens and 100 threads hang. rpcss continues to function with 100 extra threads and then sometime later the same thing happens adding another 100 or so hung threads. These threads never go away until the process is restarted.

any ideas?
J_R
 
Posts: 6
Joined: Mon Nov 24, 2008 9:25 pm

Re: LPC debugging

Postby VDO » Wed Jan 14, 2009 11:37 am

SWMR Lock should mean Single Writer Multiple Reader (I guess) so I would suggest to find a thread that keeps the lock synchronization object in non-signalled state:

Code: Select all
99699cec 8092ae57 88d7f2f8 00000006 80925e01 nt!KeWaitForSingleObject+0x346 (FPO: [5,13,4])


Its address is 88d7f2f8 and your thread had been waiting for 51 minutes:

Code: Select all
Ticks: 197625 (0:00:51:27.890)


Are there any suspicious threads in that process that do not wait on it but have TID lower than the first waiting thread TID? Are there any threads in rpcss waiting for lpc messages (potential LPC deadlock)? You can try to search for this address on raw stacks to find a thread that might have owned it as a single writer and now blocked on something else effectively blocking multiple reader threads.
VDO
Site Admin
 
Posts: 552
Joined: Mon May 01, 2006 10:34 am
Location: Dublin, Ireland


Return to Kernel mode debugging

Who is online

Users browsing this forum: No registered users and 1 guest

cron