Thursday, May 6, 2010

BugCheck c2 RCA

Symptom:
Bugcheck code 000000C2
Arguments 00000000`00000007 00000000`0000121a 00000000`00000000 fffffadf`fff03010

Cause:
We have seen the current version combination that is running on the system has shown this type of dump in the past. The other thing we want to do is enable Special Pool through Driver Verifier. This way, in case it is not the drivers, we can catch who may be over-writing memory and corrupting it as it should not be.

Resolution:
Update storport and Hpcisss2 per 940015 You receive a Stop error message after you install update 932755 or 941276 on an HP ProLiant server that is running Storport in Windows Server 2003
http://support.microsoft.com/default.aspx?scid=kb;EN-US;940015
There is a link to update storport.

For Hpcisss2 contact HP to obtain the latest HP ProLiant Support Pack for Microsoft Windows Server 2003 or to obtain an update to HP driver Hpcisss2.sys. For information about how to contact HP, visit the following Microsoft Web site: http://support.microsoft.com/gp/vendors

After you have updated storport and hpcisss2 we need to turn on verifier.exe to try and catch the driver that could be corrupting memory.
244617 Using Driver Verifier to identify issues with Windows drivers for advanced users
http://support.microsoft.com/default.aspx?scid=kb;EN-US;244617

We need to first go ahead and update the HPCISSS2.SYS and STORPORT.SYS files as this version combination has shown this type dump in the past. The other thing we want to do is enable Special Pool through Driver Verifier. This way, in case it is not the drivers, we can catch who may be over-writing memory it should not be. But these drivers need to be updated first as they have caused these known stop errors.

1. Run VERIFIER.EXE from START and RUN
2. Select "Create Custom Settings" and NEXT
3. Select "Select individual settings from a full list" and NEXT
4. Select "Special Pool" and NEXT
5. Select "Select driver names from a list" and NEXT
6. For the list, we can either choose everything or just select driver names. If you choose to select the driver names, select all drivers that are not Microsoft.
7. FINISH and reboot the machine.

*******************************************************************************
* Bugcheck Analysis *
*******************************************************************************

6: kd> vertarget
Windows Server 2003 Kernel Version 3790 (Service Pack 2) MP (8 procs) Free x64
Product: Server, suite: TerminalServer SingleUserTS
Built by: 3790.srv03_sp2_qfe.090319-1204
Machine Name:
Kernel base = 0xfffff800`01000000 PsLoadedModuleList = 0xfffff800`011d8280
Debug session time: Mon Jan 4 03:39:37.433 2010 (UTC - 6:00)
System Uptime: 92 days 3:08:10.224

6: kd> !sysinfo machineid
Machine ID Information [From Smbios 2.4, DMIVersion 36, Size=2523]
BiosVendor = HP
BiosVersion = I14
BiosReleaseDate = 11/03/2008
SystemManufacturer = HP
SystemProductName = ProLiant BL480c G1
SystemFamily = ProLiant
SystemSKU = 435462-B21

6: kd> x srv!srvcomputername
fffffadf`254e5a10 srv!SrvComputerName =
6: kd> !ustr fffffadf`254e5a10
String(24,34) srv!SrvComputerName+0000000000000000 at fffffadf254e5a10: PRBMSGHUB001

6: kd> .bugcheck
Bugcheck code 000000C2
Arguments 00000000`00000007 00000000`0000121a 00000000`00000000 fffffadf`fff03010

BAD_POOL_CALLER (c2)
The current thread is making a bad pool request. Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arguments:
Arg1: 0000000000000007, Attempt to free pool which was already freed
Arg2: 000000000000121a, (reserved)
Arg3: 0000000000000000, Memory contents of the pool block
Arg4: fffffadffff03010, Address of the block of pool being deallocated

6: kd> !pool fffffadffff03010
Pool page fffffadffff03010 region is Nonpaged pool expansion
fffffadffff03000 is not a valid large pool allocation, checking large session pool...
fffffadffff03000 is freed (or corrupt) pool
Bad allocation size @fffffadffff03000, zero is invalid
***
*** An error (or corruption) in the pool was detected;
*** Attempting to diagnose the problem.
***
*** Use !poolval fffffadffff03000 for more details.
***
Pool page [ fffffadffff03000 ] is __inVALID.
Analyzing linked list...
[ fffffadffff03000 ]: invalid block size [ 0x0 ] should be [ 0x40 ]
Scanning for single bit errors...
None found

6: kd> kv
Child-SP RetAddr : Args to Child : Call Site
fffffadf`23eb9948 fffff800`011ad769 : 00000000`000000c2 00000000`00000007 00000000`0000121a 00000000`00000000 : nt!KeBugCheckEx [d:\nt\base\ntos\ke\amd64\procstat.asm @ 170]
fffffadf`23eb9950 fffff800`010783fa : fffffadf`389ef080 fffffadf`35b6cbf0 fffffadf`fff03010 00000000`00000004 : nt!ExFreePoolWithTag+0x401 [d:\nt\base\ntos\ex\pool.c @ 4636]
fffffadf`23eb9a10 fffff800`01282eda : fffffadf`35aa33b0 fffffadf`23eb9cf0 00000000`00000000 fffffadf`359d64e0 : nt!IopAllocateIrpPrivate+0x13e [d:\nt\base\ntos\io\iomgr\iosubs.c @ 853]
fffffadf`23eb9a70 fffff800`01282b96 : 00000000`00000000 00000000`00000560 00000000`00000000 00000000`00000000 : nt!IopXxxControlFile+0x6fe [d:\nt\base\ntos\io\iomgr\internal.c @ 9042]
fffffadf`23eb9b90 fffff800`0102e37d : 00000000`058d8358 00000000`00000000 00000000`00000002 fffffadf`23eb9cf0 : nt!NtDeviceIoControlFile+0x56 [d:\nt\base\ntos\io\iomgr\devctrl.c @ 108]
fffffadf`23eb9c00 00000000`77ef0a5a : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KiSystemServiceCopyEnd+0x3 (TrapFrame @ fffffadf`23eb9c70) [d:\nt\base\ntos\ke\amd64\trap.asm @ 1974]
00000000`0574e828 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x77ef0a5a

6: kd> !pool 0xfffffadf`389ef080
Pool page fffffadf389ef080 region is Nonpaged pool
*fffffadf389ef000 size: 340 previous size: 0 (Allocated) *ObjT (Protected)
Pooltag ObjT : object type objects, Binary : nt!ob

6: kd> !handle 0x00000000`058d8358
processor number 6, process fffffadf35ba0bc0
PROCESS fffffadf35ba0bc0
SessionId: 0 Cid: 09b8 Peb: 7fffffd4000 ParentCid: 01c8
DirBase: 8d81a000 ObjectTable: fffffa8000361ad0 HandleCount: 478.
Image: MSExchangeTransportLogSearch.exe
Handle table at fffffa8002ee6000 with 478 Entries in use
8530300085320: Unable to read nonpaged object header

6: kd> lmvm storport
start end module name
fffffadf`29238000 fffffadf`29268000 storport (deferred)
Image path: \WINDOWS\system32\drivers\storport.sys
Image name: storport.sys
Timestamp: Sat Feb 17 00:02:03 2007 (45D69A5B)
CheckSum: 0003A30F
ImageSize: 00030000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4

6: kd> lmvm hpcisss2
start end module name
fffffadf`294f3000 fffffadf`29505000 HpCISSs2 (deferred)
Image path: HpCISSs2.sys
Image name: HpCISSs2.sys
Timestamp: Tue Mar 20 22:18:22 2007 (4600A3FE)
CheckSum: 00015F1F
ImageSize: 00012000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4

Kernel Dump process for Windows

Instructions to check if machine is configured properly for generating a kernel dump: -
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1) On "My Computer" right click and select "Properties"
2) Select the "Advanced" tab in the "System Properties" page
3) Under "Startup and Recovery" section click on the "Settings" button
4) Under the "Write Debugging Information" section select: "Kernel Memory Dump" from the drop down menu
5) Make sure a check mark is placed on:
"Write an event to the system log"
"Overwrite any existing file"
"Send an administrative alter"
6) Check if "Dump File" path is set to "%SystemRoot%\MEMORY.DMP"
7) Make sure that there is page file is set on a system drive (C:) and that it is at least RAM + 12 MB
8) Check if there is at least 1.5 GB of free space on the system drive (C:)
9) ASR or similar feature is disabled from the BIOS which may prevent machine from writing a dump file
10) Make sure the page file is not split and entirely on C:\
11) For better performance it is advised to have initial and maximum size of the page file same e.g. 1024 for Initial size and 1024 for Maximum size.

OSD for Workstation Deployments Not Functioning Properly

Symptom:
Customer has recently implemented an new SCCM 2007 site server on a server 2008 system. They are reporting that they are having issues with getting OSD working on this new site. They are reporting multiple problems and need assistance getting OSD working in their environment.

Cause:
Following files were missing\incorrect in the WIM
\Data files:
- TSMBOOTSTRAP.INI
- VARIABLES.DAT

Resolution:
http://blog.coretech.dk/osdeploy/boot-media/using-wds-for-legacy-ris-images-sccm-boot-images-and-psp-sccm-pxe-service-point

++++++++++++++++
Creating the SCCM Boot.wim (this is a rewrite of Johan Arwidmark’s excellent guide)

1. Log on to the primary site server that holds the MP (if more than one is present, repeat step 2 thru 7 on each one).

2. In the TaskSequence node, created a CD boot media, e.g. C:\Deploy.iso

3. Extract the C:\Deploy.iso to a folder e.g. C:\DeployCD

4. Using Imagex, mount the C:\DeployCD\Source\boot.wim to a folder e.g. C:\Mount
imagex /mountrw C:\DeployCD\boot.wim 1 C:\Mount

5. Copy C:\DeployCD\SMS\DATA to C:\Mount\SMS (the entire folder not just the files)

6. Using Imagex, unmount the image and commit the changes.
imagex /unmount /commit D:\Mount

7. Add the C:\DeployCD \Sources\boot.wim as a boot image to on the WDS Server.
++++++++++++++++

Customer performed steps above and was able to get PXE boot working as before in old environment. There paths were different in steps 6 but process is the same.

What was missing in the WIM was the \Data files:
- TSMBOOTSTRAP.INI
- VARIABLES.DAT

Process above made those available in the resulting WIM file.

Exchange Server Enterprise 2003/Having problem sending emails to distribution list

Setup:
=======
>>Multi domain environment

Issue :
=====
Mails sent to corp DL gets stuck in the MADL QUEUE
The corp DL is a nested DL
The type of the DL is GLOBAL security group
Expansion server is set to ANY SERVER

Symptom:

=========

When a user sends a message to a global distribution or security group in a multidomain forest, the users in the global distribution group may not receive the message. The last entry in the tracking logs is a submission to categorizer.

The person who sends the message does not receive a non-delivery report (NDR). The sent message disappears and is not delivered. The message to the Global DL that has disappeared in CAT cannot be retrieved. It is not in the Queue directory or in the Temp table.

Mail that is sent directly to individual members of the global groups delivers successfully.

Concern:
==========

The DS access tab contains ‘n’ number of servers , then why does exchange contact another GC in other domains?

Was working for years, and why is it all happening all of a sudden.

Explanation:
============

Exchange doesn't much care which GC it talks to. It picks a DC based on the DS Access Tab of the Expansion Server and on the AD site. If the GC knows the DL name and knows the members of it, the query would be resolved. If the GC doesn't know the members of the DL, the query ends there and we see the hard error in the application log, due to expansion failure (Provided the diagnostic logging enabled for CAT).

In Exchange 2000 and 2003, Microsoft recommends that all distribution groups used for email are Universal groups, not Domain Local or Global groups. This has been our recommendation for many years, as configurations outside of this can result in abnormal mail flow (as you have seen) or lost email. To quote from our Knowledge Base Article #839949:

“Only universal group memberships are replicated across all domains to all global catalog servers in the forest. Microsoft always recommends using universal distribution groups for mail distribution in a multi-domain”

Now some further explanations as to why this is a problem:
====================================================

In short, Exchange is simply delivering the mail to the users that it is told should receive it. Please note that Exchange knows nothing about the members of the DL, it counts on the GC to provide this information. The basic process looks like this:

- Mail is sent to a distribution list from the mail client of choice.
- Exchange Categorizes the message, and in the process needs to lookup the members of the DL.
- Exchange sends an LDAP query to a GC, the GC looks up the DL name, checks the membership, and responds to Exchange with recipients.
- Exchange delivers successfully to all recipients. Looks good, the process worked.

Now let’s say the DL contains 100 recipients. But because Exchange delivers based off what the GC tells us, and it only knows about 20 users, Exchange is acting as designed. We cannot NDR the message or throw an error, or notify anyone there was a problem, because we simply weren’t told by the GC that the message was ever intended for those additional 80 people. But in the example, the reason the DL is missing 80 people from the membership because the recipients are spread across multiple domains, and global or domain local memberships are not replicated to all GC’s. As we know, only universal groups and their members are replicated across the organization to all GC’s. This is the reason only Universal Groups are recommended and supported for mail flow

It is also important to note that Exchange first queries any GC in its AD Site. Remember that GC’s from different domains can be kept in the same AD Site. Group memberships as noted above, however, are kept per domain. So when Exchange is looking for a GC, it is quite possible for Exchange to pick alternate GC’s in its AD Site, each containing memberships for different domains. Again, this is another reason Universal Groups are recommended.

Also, please be aware that in Exchange 2007, only Universal mail enabled Distribution Lists can be created. Though, you can view the Global Groups using Exchange management console.

Related articles:

http://support.microsoft.com/?id=839949
http://technet.microsoft.com/en-us/library/cc755692.aspx
http://msexchangeteam.com/archive/2006/06/23/428114.aspx

Windows Server 2003 Standard\print spooler - spooler service is crashing

Symptom:
Print spooler service crashes on Windows 2003 print server.

Cause:
Based on the crash dump analysis we found C:\WINDOWS\system32\spool\drivers\w32x86\3\CSPEUI.dll was causing issue.

Resolution:
We renamed this dll file and removed AdobePS IKONV2_1 drivers from the print server.
We applied hotfix KB 946198 for printers showing offline issue and reboot the print server.
Print spooler seems to be stable after this fix.

Additional Information:
946198 The print queue status is displayed as "Offline" on a Windows Server 2003-based print server if SNMP is enabled and if the printer devices do not respond to SNMP commands
http://support.microsoft.com/default.aspx?scid=kb;EN-US;946198

Friday, April 30, 2010

Long time since my last post

I know it's been a very long time since my last post. This again is a website I try to use for my own knowledge refresher and to state my opinions. I would like to continually update this blog and eventually make it a more informing blog that will interest and entertain other people.

Monday, June 15, 2009

Change Cluster Account Password KB

Note: In Microsoft Windows NT 4.0 and Microsoft Windows 2000, to change the Cluster service account password, you have to stop the Cluster service on all nodes before you can make the password change

To Prompt for password, use the following parameters: (Password will not be visible while being typed and it will prompt for confirmation)
cluster /cluster:EASTCLUSTER /changepassword /skipdc /force

http://support.microsoft.com/kb/305813