Outbound delivery unsuccessful with short DNS TTLs

Raise/discuss any potential issues with MailEnable for consideration in project issue register.
Post Reply
markpizz
Posts: 1
Joined: Mon Jun 30, 2014 12:03 am

Outbound delivery unsuccessful with short DNS TTLs

Post by markpizz »

Mail Enable has issues sending to domain: GEWT.NET

The single MX record for GEWT.NET points at MAILER.GEWT.NET
MAILER.GWET.NET has an A record. This is completely normal, except that the the MX and A records have a 5 minute TTL values.

The records for GEWT.NET have relatively short TTLs and they have very low query rates from the world at large, so when an outgoing email delivery is attempted, the initial query passed to a local caching name server will never be in the cache. This is compounded by the fact that the related NS records for this domain also have a 5 minute TTL so the initial attempt will usually timeout while the recursive resolver digs up the answer to the MX record lookup. This seems to be true without regard to the nameserver which is being queried. Some nameservers (8.8.8.8 or 75.75.75.75) tend not to return the mailer.gewt.net A record as additional data in the query response with is eventually returned.

I've traced DNS traffic from MailEnable to the nameserver which is configured as the MailEnable SMTP DNS nameserver.
MailEnable generates the MX lookup and logs the following in the debug log:
06/29/14 16:00:20 ME-I0123: Domain [gewt.net] has MX list [mailer.gewt.net]
06/29/14 16:00:20 ME-I0026: [02595B5C717B44D0A71E8FE0F8823189.MAI] Sending message
06/29/14 16:00:34 MF-E0039: [02595B5C717B44D0A71E8FE0F8823189.MAI] DNS Lookup failure (11001): Domain mailer.gewt.net does not exist.

I'm using an alternate DNS Server running BIND since the Microsoft Nameserver which is used by the local domain is much less reliable and is easily subject to cache corruption,

The above log entry would suggest that a DNS lookup was done to determine the A record. However, NO DNS A record lookup traffic is observed coming from the MailEnable machine to the nameserver. Maybe the A record lookup was done by a gethostbyname() API call which would use the windows default nameserver. I now believe that this is EXACTLY what is happening since when I make an effort to manually populate the cache of the default windows nameserver AND the nameserver specified in the MailEnable SMTP setup, the above lookup failure does not occur. The error code of 11001 which is a WINSOCK error returned by gethostbyname()...

I initially thought that older versions of MailEnable tended to encounter this condition and fail the current attempt and leave it in the queue for a retry later. Of course the retry would happen more than 5 minutes later and thus all of the cached info at the nameserver would have expired so things would merely repeat and get the same result.

The current version of MailEnable encounters this condition (without attempting an explicit A record lookup) and fails the delivery attempt immediately.

After understanding that this is a gethostbyname() failure I think that nothing has actually changed internally with MailEnable's handling of this. The fact that with the earlier MailEnable version I saw retries and with the later version of MailEnable I don't is due to the fact that the Microsoft Nameserver has a different cache state now and is returning incorrect answers (thus the non-existant domain).

I can manage to get a message out only if I use nslookup to manually query BOTH configured SMTP DNS nameserver AND the default windows name server prior to sending the message. MailEnable's query and the Microsoft gethostbyname() lookup both the MX and A records and delivery succeeds.

There is a defect I'm describing here. That is the fact that gethostbyname() is being used to lookup A records. This normally would be fine if you didn't explicitly provide a DNS Server to query as part of the SMTP Server configuration. Given that the DNS Server is explicitly configured, reasonable people would expect it to be used for all DNS queries that the SMTP server makes.

This problem seems to be coming to light in my case due to the short TTLs on the records in the GEWT.NET domain. As a workaround to this issue, I'm getting the admin of that domain to increase his TTLs to much larger values, but the underlying problem should really be fixed.

MailEnable-Ian
Site Admin
Posts: 9738
Joined: Mon Mar 22, 2004 4:44 am
Location: Melbourne, Victoria, Australia

Re: Outbound delivery unsuccessful with short DNS TTLs

Post by MailEnable-Ian »

Hi,

Yes we can confirm this is a problem and is a currently logged defect that will be addressed in future releases.
Regards,

Ian Margarone
MailEnable Support

Post Reply