Login

magosk · (This post was last modified: 01-03-2023, 10:59 AM by magosk.)

Hi,

let me give you some background info first: We have a couple of production servers powering a SaaS solution, one of them serving as a master and the other a slave (continuously getting data backups from the master). These both contain three IntraWeb standalone applications (running as services) as well as a number of other backend services (some of which expose APIs). Previously, these services listened on different ports (for https traffic), with an external firewall translating calls on the standard 443 port for different external IP addresses (connected to unique URLs) to the port used internally on the server. The three IW services as well as a couple of APIs written in Delphi were all using OpenSSL, whereas one .NET service used Windows own SSL functionality for its API. As we were running out of IP addresses on the hosting site, we decided to refactor our applications to all using http.sys, so that all services could share the same port and external IP address (being distinguished by different paths in the URL, such as '/api', '/m' etc.), . A secondary reason was for not being stuck with OpenSSL 1.0.2 as this no longer receives updates.

This all worked well in test environments and also initially after upgrading our production services to the new version using http.sys. However, after a few days of operation the master server started getting various errors, which could not be resolved by anything else than a restart of the server. This repeated itself every 2 or 3 days. We tried switching operation to the other (slave) server, but then the same errors occurred there (after running a few days). We also tried upgrading the OS of one of the servers to Windows Server 2019 (previously both were running Windows Server 2012), but it did not solve the problem. The errors we saw in our maintenance logs had not occurred before the upgrade. They affect different services and you do not see any obvious connection between them, but once one of the errors started occurring, the others soon followed. My guess is that they are different symptoms of the same underlying issue, but not necessarily a clue to what the root problem is. These are the errors we have seen:

Exception EMenuError with message 'Out of system resources' raised when trying to set Enabled to False for menu item (from an IntraWeb application)
A command-line application sox.exe used for audio file conversions fails, and you could (at least in Win 2012) see a corresponding error for a kernel dll in the Windows Event log.
Parsing of XML files using MSXML (Microsoft XDOM) fails.
Failing to set up a new connection to a NexusDB server using a so called SharedMemory transport.
Seeing error event 10010 in Windows event log (timeout error for DistributedCOM), however not seemingly connected to an event in any of our services.

We have improved error handling in our code, removed unnecessary usage of the sox application, replaced MSXML with OXML, using another type of transport towards nxServer etc. making our applications more robust against the errors, but the errors 1, 2 and 5 still do occur typically after 2-10 days of operation since the last restart and then we need to restart again. Has anyone else experienced similar problems? Any insights to as to why this is happening, and possible solutions or workarounds would be much appreciated.

Best regards

Magnus Oskarsson

Login
Username:
Password:	Lost Password?
	Remember me