Monday, December 03, 2007

File copy failures across WAN links

We have a customer that is trying to copy data from his JDEdwards enterprise server (NT SQL) across the extranet and it fails intermittently. After researchign the issue, the customer suggested that the issue has been occuring for a while and always seems to happen during times of high load (payroll processing, etc...). The file copy works from another server on our network (dev client on same subnet as enterprise server).
A quick look into the normal areas: event log - CPU, memory, disk, network - showed no bottleneck that would explain this. To make matters worse, we tried performing the same file copy from our local network and everything works fine. Several network traces and reviews were performed and while a few errors were found, there was nothing that could explain this issue.
Finally I setup a perfmon trace to run for the day on a couple hundred objects, expecting to review the logs and see what is spiking during business hours. After the trace completed, I noticed the Memory: Free System Page Table Entries was holding steady at around 3000, much lower than it should be (an unstressed system is over 150000).
A quick google search turns up a MSDN blog (http://blogs.msdn.com/chadboyd/archive/2007/03/24/pae-and-3gb-and-awe-oh-my.aspx) discussing PTEs and how they fit into the realm of memory management. After reading the article state the PTEs should never be below 7000 I was surprised that we didnt run into more problems with this server.

OK, so the root cause is now identified, how do I resolve it? If this server was running SQL only I would have no problems with including the /USERVA switch, or even removing the /3GB switch, but this server also had JDEdwards installed and I dont know how it works with memory... Contacting my usual sources did little for me; they all knew how much memory JDE needed and how to configure/size it, but nobody knew how much system memory was needed or if JDE could use AWE like SQL does.
A little more googling and I found a PPT discussing JDE on windows (http://www.microsoft-oracle.com/Assets/ppt/JDE%20on%20SQL%20Server%202005%20webcast.ppt). This doc specifically states that JDE should use the /3GB switch, which suggests it needs the extra memory space also.... If I try and tune the memory by adjusting the settings in the BOOT.INI I may be able to resolve this issue, but it could compromise the SQL and JDE performance.

Below are the solutions I have identified to resolve the issue:
  1. Remove the /3GB switch from boot.ini – Most of the JDE and SQL documentation I discovered suggested enabling this setting, removing it would cause SQL and JDE performance to degrade. Not a suggested option
  2. Adjust the memory allocation with the /USERVA switch in boot.ini – Similar to removing the /3GB switch, this can cause SQL and JDE performance to degrade. It may be possible to find an acceptable middle ground, but that may take several weeks to identify the correct setting. Not a suggested option
  3. Have JDE drop files to other system – This could address the immediate problem but not the root issue. Not a suggested option
  4. Rebuild system to 64bit OS – This would resolve the memory management issues. Suggested option
  5. Move SQL and Logic systems to different hardware – This separates SQL and JDE management to separate servers which allows more granular control and separation of resources. It also maps to the configuration to the new standard for NT SQL customers of this size. Suggested option

No comments: