Koozali.org: home of the SME Server

TLS failed: Could not create SSL socket -intermittent

Offline biogenic

  • 12
  • +0/-0
TLS failed: Could not create SSL socket -intermittent
« on: August 06, 2020, 05:37:05 AM »
Hi! Would be grateful if someone could shed whatever light on below.

Errors:
TLS failed: Could not create SSL socket at /usr/share/qpsmtpd/plugins/tls line 235.
TLS failed: Could not create SSL socket: Connection timed out at /usr/share/qpsmtpd/plugins/tls line 235.

Setup: SME Server 9.2 using self-signed certificate
We use messagelabs as email gateway as mandated by head office, so all our incoming email to SME-server passes thru their servers.
Our SME Server is on a VM, and is behind a firewall. 

It happens to ~20% of emails - out of 946 incoming emails sampled, 216 were denied because TLS fails / timeouts,
but the remaining 730 were successfully received/queued. 

Any suggestions where to look why TLS fails sometimes but not all the time?
I noticed this happens when there are many TLS connections open.
Is it a problem with the SME-server, the firewall, the network, or Messagelabs?

I read on this forum post
about disabling TLS on port 25 - will that help or hurt? 
https://forums.contribs.org/index.php/topic,51492.msg261704.html#msg261704

Below are the logs, they're different dates but still representative of the actual problem logs:
1) qpsmtpd when email does not get through
2) qpsmtpd when email gets through (from same messagelabs server as above)
3) messagelabs logs when email does not get through
4) firewall logs showing connection failed

Thank you in advance for any guidance.

--------------------------------------------------------------------------------------
This is the sample qpsmtpd log when the email does NOT get through
--------------------------------------------------------------------------------------
Code: [Select]
2020-08-06 08:44:27.779446500 18375 Accepted connection 3/40 from 192.168.189.2 / Unknown
2020-08-06 08:44:27.779774500 18375 Connection from Unknown [192.168.189.2]
2020-08-06 08:44:28.061458500 18375 (connect) relay: skip, no match
2020-08-06 08:44:28.061859500 18375 220 xxx.xxx.com.ph ESMTP
2020-08-06 08:44:28.298624500 18375 dispatching EHLO mail6.bemta23.messagelabs.com
2020-08-06 08:44:28.300730500 18375 250-xxx.xxx.com.ph Hi Unknown [192.168.189.2]
2020-08-06 08:44:28.300866500 18375 250-PIPELINING
2020-08-06 08:44:28.300965500 18375 250-8BITMIME
2020-08-06 08:44:28.301067500 18375 250-SIZE 15000000
2020-08-06 08:44:28.301165500 18375 250 STARTTLS
2020-08-06 08:44:28.537959500 18375 dispatching STARTTLS
2020-08-06 08:44:28.538237500 18375 220 Go ahead with TLS

...other transaction logs
...
...after 5 minutes...

2020-08-06 08:49:58.539777500 TLS failed: Could not create SSL socket:  at /usr/share/qpsmtpd/plugins/tls line 235.
2020-08-06 08:49:58.539779500
2020-08-06 08:49:58.539780500 18375 (deny) logging::logterse: ` 192.168.189.2 Unknown mail6.bemta23.messagelabs.com tls 901 TLS Negotiation Failed msg denied before queued
2020-08-06 08:49:58.539781500 18375 500 TLS Negotiation Failed

----------------------------------------------------------------------------------------
This is the sample qpsmtpd log when the email gets through:
----------------------------------------------------------------------------------------
Code: [Select]
2020-08-06 04:55:14.966213500 6137 Accepted connection 0/40 from 192.168.189.2 / Unknown
2020-08-06 04:55:14.966590500 6137 Connection from Unknown [192.168.189.2]
2020-08-06 04:55:15.285346500 6137 (connect) relay: skip, no match
2020-08-06 04:55:15.285706500 6137 220 xxx.xxx.com.ph ESMTP
2020-08-06 04:55:15.549034500 6137 dispatching EHLO mail6.bemta23.messagelabs.com
2020-08-06 04:55:15.551345500 6137 250-xxx.com.ph Hi Unknown [192.168.189.2]
2020-08-06 04:55:15.551444500 6137 250-PIPELINING
2020-08-06 04:55:15.551535500 6137 250-8BITMIME
2020-08-06 04:55:15.551632500 6137 250-SIZE 15000000
2020-08-06 04:55:15.551725500 6137 250 STARTTLS
2020-08-06 04:55:15.814714500 6137 dispatching STARTTLS
2020-08-06 04:55:15.815018500 6137 220 Go ahead with TLS
2020-08-06 04:55:58.886774500 6137 (unrecognized_command) tls: TLS setup returning
2020-08-06 04:55:59.149296500 6137 dispatching EHLO mail6.bemta23.messagelabs.com
2020-08-06 04:55:59.150574500 6137 250-xxx.com.ph Hi Unknown [192.168.189.2]
2020-08-06 04:55:59.150674500 6137 250-PIPELINING
2020-08-06 04:55:59.150799500 6137 250-8BITMIME
2020-08-06 04:55:59.150902500 6137 250-SIZE 15000000
2020-08-06 04:55:59.150996500 6137 250 AUTH PLAIN LOGIN
2020-08-06 04:55:59.417272500 6137 dispatching MAIL FROM:<prvs=4797c41c4=yyy@dhl.com>
2020-08-06 04:55:59.418706500 6137 (mail) naughty: pass
2020-08-06 04:55:59.419569500 6137 250 <prvs=4797c41c4=yyy@dhl.com>, sender OK - how exciting to get mail from you!
2020-08-06 04:55:59.683141500 6137 dispatching RCPT TO:<zzz@xxx.com.ph>
2020-08-06 04:55:59.684717500 6137 (rcpt) badrcptto: pass
2020-08-06 04:55:59.684976500 6137 (rcpt) check_goodrcptto: stripping '-' extensions
2020-08-06 04:55:59.694640500 6137 (rcpt) rcpt_ok: pass: xxx.com.ph in rcpthosts
2020-08-06 04:55:59.695018500 6137 250 <zzz@xxx.com.ph>, recipient ok
2020-08-06 04:55:59.958253500 6137 dispatching RCPT TO:<aaa@xxx.com.ph>
2020-08-06 04:55:59.959491500 6137 (rcpt) badrcptto: pass
2020-08-06 04:55:59.959706500 6137 (rcpt) check_goodrcptto: stripping '-' extensions
2020-08-06 04:55:59.969948500 6137 (rcpt) rcpt_ok: pass: xxx.com.ph in rcpthosts
2020-08-06 04:55:59.970210500 6137 250 <aaa@xxx.com.ph>, recipient ok
2020-08-06 04:56:00.233228500 6137 dispatching DATA
2020-08-06 04:56:00.233849500 6137 354 go ahead
2020-08-06 04:56:00.760909500 6137 spooling message to disk
2020-08-06 04:56:03.009972500 6137 (data_post_headers) dkim: pass, valid signature, accept policy
2020-08-06 04:56:03.010108500 6137 (data_post_headers) dkim: karma 1 (1)
2020-08-06 04:56:03.012465500 6137 (data_post) bogus_bounce: pass, not a null sender
2020-08-06 04:56:03.013624500 6137 (data_post) headers: pass
2020-08-06 04:56:03.013885500 6137 (data_post) naughty: pass
2020-08-06 04:56:03.085614500 6137 (queue) logging::logterse: ` 192.168.189.2 Unknown mail6.bemta23.messagelabs.com <prvs=4797c41c4=zzz@dhl.com> <yyy@xxx.com.ph>,<aaa@xxx.com.ph> queued <abcdefghijklmnop@dhl.com>
2020-08-06 04:56:03.093268500 6144 (queue) queue::qmail_2dqueue: (for 6137) Queuing to /var/qmail/bin/qmail-queue
2020-08-06 04:56:03.582931500 6137 250 Queued! 1596660963 qp 6144 <abcdefghijklmnop@dhl.com>
2020-08-06 04:56:08.847242500 6137 dispatching QUIT
2020-08-06 04:56:08.847887500 6137 221 xxx.com.ph closing connection. Have a wonderful day.
2020-08-06 04:56:08.848304500 6137 click, disconnecting
----------------------------------------------------------------
Sample logs from messagelabs when unable to get through
----------------------------------------------------------------
Code: [Select]
2020-06-24 01:57:19 PM Connection from: 209.85.167.182 (mail-oi1-f182.google.com)
2020-06-24 01:57:19 PM Sending server HELO string:mail-oi1-f182.google.com
2020-06-24 01:57:19 PM Message id:CAFrFzwyjV=uFR5hoc37VNF3Cp8VYMHdgrsP5U4_N_L1=NJWecQ@mail.gmail.com
2020-06-24 01:57:19 PM Message reference: 159297823900000012963570001354022
2020-06-24 01:57:19 PM Sender: xxx@gmail.com
2020-06-24 01:57:19 PM Recipient: bbb@xxx.com.ph
2020-06-24 01:57:20 PM SMTP Status: OK
2020-06-24 01:57:21 PM Delivery attempt #1
2020-06-24 01:57:21 PM Recipient server: 0.0.0.0
2020-06-24 01:57:21 PM Response: 421 4.4.0 [internal] no mail servers for this domain could be reached at this time
2020-06-24 02:08:47 PM Delivery attempt #2
2020-06-24 02:08:47 PM Recipient server: 1.2.3.5 (1.2.3.5)
2020-06-24 02:08:47 PM Response: 450 4.4.1 [internal] Connection Timed Out
2020-06-24 02:09:34 PM Delivery attempt #3
2020-06-24 02:09:34 PM Recipient server: 1.2.3.5 (1.2.3.5)
2020-06-24 02:09:34 PM Response: 421 4.4.0 [internal] no mail servers for this domain could be reached at this time
2020-06-24 02:36:10 PM Delivery attempt #4 (final)
2020-06-24 02:36:10 PM Recipient server: 1.2.3.5 (1.2.3.5)
2020-06-24 02:36:10 PM Response: 250 Queued! 1592980570 qp 25886

---------------------------------------------------------------------------------------------------------------------------------------
Sample logs from firewall
                          - 67.219.xxx.xxx messagelabs
                          - 192.168.0.1    SME server internal IP
                    - 192.168.189.2  internal firewall IP
                          - 1.2.3.4        Firewall public IP stand-in   
                          - 1.2.3.5        SME Server public IP standin                  
---------------------------------------------------------------------------------------------------------------------------------------
Code: [Select]
238 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.206 SMTP 1977416 35347 94022833 1.2.3.4 50066 6 50066 25
239 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.206 SMTP 1977505 35191 94022834 1.2.3.4 50068 6 50068 25
240 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.206 SMTP 1977521 35347 94022827 1.2.3.4 50054 6 50054 25
241 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.206 SMTP 1977466 35195 94022828 1.2.3.4 50056 6 50056 25
242 08/04/2020 10:19 notice accept 192.168.0.1 67.219.250.206 SMTP 22603 846 93996452 1.2.3.4 50024 6 50024 25
243 08/04/2020 10:19 notice close 67.219.246.154 1.2.3.5 SMTP 62630 1757 94023043 192.168.189.2 15229 6 15229 25
244 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.110 SMTP 822783 14608 93981062 1.2.3.4 43390 6 43390 25
245 08/04/2020 10:19 notice close 67.219.246.152 1.2.3.5 SMTP 823 15711 94007878 192.168.189.2 8686 6 8686 25
246 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.110 SMTP 27145 824 93981459 1.2.3.4 43392 6 43392 25
247 08/04/2020 10:19 notice accept 67.219.246.152 1.2.3.5 SMTP 631 15503 94007878 192.168.189.2 8686 6 8686 25
248 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.206 SMTP 88802 2746 94021838 1.2.3.4 50052 6 50052 25
249 08/04/2020 10:19 notice close 192.168.0.1 67.219.250.110 SMTP 27110 846 93981060 1.2.3.4 43388 6 43388 25
250 08/04/2020 10:19 warning ip-conn 192.168.0.1 67.219.250.110 SMTP Failed Connection Attempt 93981060 6 43388 25
251 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.206 SMTP 88803 2535 94021781 1.2.3.4 50050 6 50050 25
252 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.110 SMTP 27155 824 93981058 1.2.3.4 43384 6 43384 25
253 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.110 SMTP 27156 824 93981059 1.2.3.4 43386 6 43386 25
254 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.206 SMTP 93880 2835 94021642 1.2.3.4 50046 6 50046 25
255 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.206 SMTP 95373 2791 94021629 1.2.3.4 50044 6 50044 25
256 08/04/2020 10:19 warning ip-conn 192.168.0.1 67.219.250.206 SMTP Failed Connection Attempt 94021629 6 50044 25
257 08/04/2020 10:19 notice client-rst 192.168.0.1 67.219.250.206 SMTP 88700 2587 94021665 1.2.3.4 50048 6 50048 25
258 08/04/2020 10:19 notice close 192.168.0.1 67.219.250.110 SMTP 27104 750 93981057 1.2.3.4 43382 6 43382 25
259 08/04/2020 10:18 notice client-rst 192.168.0.1 67.219.250.110 SMTP 27163 824 93981054 1.2.3.4 43376 6 43376 25
260 08/04/2020 10:18 notice close 192.168.0.1 67.219.250.110 SMTP 27109 750 93981056 1.2.3.4 43380 6 43380 25
261 08/04/2020 10:18 notice close 192.168.0.1 67.219.250.110 SMTP 27108 750 93981055 1.2.3.4 43378 6 43378 25

« Last Edit: August 07, 2020, 08:31:47 AM by biogenic »

Offline biogenic

  • 12
  • +0/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #1 on: August 07, 2020, 09:08:35 AM »
Hi, just to narrow it down.. 

Would the following log snippet
"TLS setup returning" mean TLS handshake failed?

The log snippet:
2020-08-06 04:55:15.814714500 6137 dispatching STARTTLS
2020-08-06 04:55:15.815018500 6137 220 Go ahead with TLS
2020-08-06 04:55:58.886774500 6137 (unrecognized_command) tls: TLS setup returning
2020-08-06 04:55:59.149296500 6137 dispatching EHLO mail6.bemta23.messagelabs.com


Does this mean messagelabs uses another version?
I checked using openssl, here is the TLS version used by SME Server.
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
According to this page, messagelabs supports this: https://knowledge.broadcom.com/external/article/164629/tls-ciphers-supported-by-email-securityc.html

What do you think causes the timeout?
Do you think it's possible that the unused unsuccessful TLS connection stays open until it timeouts?
Is there a limit to maximum open TLS connections, and as such, the subsequent connections need to wait for the previous ones to timeout?

TLS failed: Could not create SSL socket: Connection timed out at /usr/share/qpsmtpd/plugins/tls line 235.
TLS failed: Could not create SSL socket at /usr/share/qpsmtpd/plugins/tls line 235.
« Last Edit: August 07, 2020, 09:27:21 AM by biogenic »

Offline ReetP

  • *
  • 3,722
  • +5/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #2 on: August 07, 2020, 12:42:09 PM »
Hmmm - first thanks for the well documented logs!

Second I don't have the answer to your issue, but we might need to tease out a bit more info first.

This in the message lab log makes me wonder a bit:

Quote
2020-06-24 01:57:21 PM Delivery attempt #1
2020-06-24 01:57:21 PM Recipient server: 0.0.0.0
2020-06-24 01:57:21 PM Response: 421 4.4.0 [internal] no mail servers for this domain could be reached at this time

Why is SME not available at all?? Is the network off? Is the network going off during the TLS negotiation causing a failure - my gut feeling is I suspect the VM is where your problem lies, not actually in the mail itself - you'd have the issue on every single mail otherwise.

Lets have a look at your configuration. Go to server-manager, Miscellaneous, Report a bug, Create Configuration report and then attach it here so we can have a look.

I guess your server is in server/gateway mode in its VM?

What is the Host OS? What resources are allocated to to the VM?

FWIW see here for Cipher settings though I am not convinced they are the real issue:

https://wiki.contribs.org/SSL_Settings

I'll try and get someone more knowledgeable than me to comment.
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline biogenic

  • 12
  • +0/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #3 on: August 07, 2020, 02:28:10 PM »
Hi, thanks for responding.
By the way I'm sorry, I just remembered from your response that the messagelabs logs are  older and might be linked to a prior error we encountered - the "Too many connections" error. The network was fine, SME server was simply not accepting more connections. We did increase the limit from the default 10 to 40.  That's when this new error shows up.

I'll try to get the latest messagelabs logs for this exact problem and post it again.
We do monitor the network / Internet connection, and there was no discernible downtime at any time.

As for the VM we did suspect it at first, so we checked the network, CPU, memory, disk. Only thing a bit high is disk latency which can reach 70ms and up to 300ms at its peak, though most of the time it's around 10-20ms. We did increase the disk share / priority to see if errors would go away, but didn't.
Resources: VM is on VMWare esxi 6.7, HP proliant host,
SME server has 4GB allocated memory, 250GB disk, 1 vCPU.
We can see ~75% memory use most of the time. CPU use usually really low.
The guest shares the host with 2 other VMs (less busy web servers).
We use SME server solely for email with email clients, <200 users, with ~1% using webmail.

As for network traffic,
The peak traffic is around 200 emails in an hour, counting internal emails.
External traffic bandwidth total is around 0.5GB per day.
Given error distributed at all hours, and not just during peak hours.

Thanks for the suggestion, we'll try adjusting the VM resources (CPU, memory) upwards and see if there's any improvement.
Just found this, will read up on this: https://wiki.contribs.org/Virtual_SME_Server




As for the SME Server config, here it is.
==================
Base configuration
==================

SME server version: 9.2
SME server mode:    serveronly
Running Kernel:     2.6.32-754.31.1.el6.x86_64



===========================
New RPMs not in base system
===========================
       
Loaded plugins: fastestmirror, post-transaction-actions, smeserver
Loading mirror speeds from cached hostfile
 * base: ftp.cuhk.edu.hk
 * smeaddons: mirrors.mab974.re
 * smeos: mirrors.mab974.re
 * smeupdates: mirrors.mab974.re
 * updates: mirror-hk.koddos.net
Extra Packages
lshw.x86_64                              2.17-1.el6.rf              @smecontribs
perl-Config-IniFiles.noarch              2.72-2.el6                 @smecontribs
perl-Filesys-DiskFree.noarch             0.06-1.2.el6.rf            @smecontribs
perl-Proc-ProcessTable.x86_64            0.48-1.el6                 @smecontribs
smeserver-affa.noarch                    3.2.2.3-8.el6.sme          @smecontribs
smeserver-crontab_manager.noarch         2.4-3.el6.sme              @smecontribs
smeserver-diskusage.noarch               0.2.0-2.el6.sme            @smecontribs
smeserver-hwinfo.noarch                  1.2-2.el6.sme              @smecontribs
smeserver-mailstats.noarch               1.1-11.el6.sme             @smecontribs
smeserver-qmHandle.noarch                1.4-12.el6.sme             @smecontribs
smeserver-userpanel.noarch               1.2-3.el6.sme              @smecontribs
smeserver-userpanels.noarch              1.1-5.el6.sme              @smecontribs
smeserver-vacation.noarch                1.1-26.el6.sme             @smecontribs
smeserver-wbl.noarch                     0.3.0-19.el6.sme           @smecontribs
 



===========================
Custom and modified templates
===========================
/etc/e-smith/templates-custom/var/service/qpsmtpd/config/relayclients/10CustomAllowRelay: MANUALLY_ADDED, ADDITION
/etc/e-smith/templates-custom/home/e-smith/ssl.crt: MANUALLY_ADDED, OVERRIDE




===========================
Modified events
===========================
/etc/e-smith/events/email-update/templates2expand/var/qmail/control/badmailfrom: MULTIPLE_RPM_OWNERS smeserver-qpsmtpd-2.4.0-39.el6.sme, smeserver-wbl-0.3.0-19.el6.sme
/etc/e-smith/events/email-update/templates2expand/var/service/qpsmtpd/config/badhelo: MULTIPLE_RPM_OWNERS smeserver-qpsmtpd-2.4.0-39.el6.sme, smeserver-wbl-0.3.0-19.el6.sme




=======================
Additional repositories
=======================

base: enabled
centosplus: disabled
contrib: disabled
extras: disabled
fasttrack: disabled
smeaddons: enabled
smecontribs: disabled
smedev: disabled
smeextras: enabled
smeos: enabled
smetest: disabled
smeupdates: enabled
smeupdates-testing: disabled
updates: enabled
         
« Last Edit: August 07, 2020, 02:40:46 PM by biogenic »

Offline ReetP

  • *
  • 3,722
  • +5/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #4 on: August 07, 2020, 03:09:20 PM »
Hi, thanks for responding.
By the way I'm sorry, I just remembered from your response that the messagelabs logs are  older and might be linked to a prior error we encountered - the "Too many connections" error. The network was fine, SME server was simply not accepting more connections. We did increase the limit from the default 10 to 40.  That's when this new error shows up.

I'll try to get the latest messagelabs logs for this exact problem and post it again.

That might help.....

Quote
We do monitor the network / Internet connection, and there was no discernible downtime at any time.

Quote
Given error distributed at all hours, and not just during peak hours.

Hence my gut feeling that it is not SME related per se, but something the host is doing that is upsetting the apple cart independent of what SME is doing.

Otherwise you'd have more consistent results - eg it would be  failing when you had a load of mail at the same time or something.

NB - I am known to be wrong frequently ;-)

Any reason for this?

Quote
/etc/e-smith/templates-custom/var/service/qpsmtpd/config/relayclients/10CustomAllowRelay: MANUALLY_ADDED, ADDITION

...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline biogenic

  • 12
  • +0/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #5 on: August 08, 2020, 04:12:32 AM »
Quote
Any reason for this?

Quote
    /etc/e-smith/templates-custom/var/service/qpsmtpd/config/relayclients/10CustomAllowRelay: MANUALLY_ADDED, ADDITION

Reason: internal server with legacy application sends to head office domain thru SME server.
I think this actually may no longer be needed though, as we later set "config setprop qpsmtpd UnauthenticatedRelayClients" as given here: https://bugs.contribs.org/show_bug.cgi?id=8704

----------------------------------
Thanks for your help, appreciate it.. Have limited experience, just learning SME Server..
Will do some more digging next week but for now, to summarize:

1) TLS with Messagelabs on port 25 fails all of the time (it seems) as shown by this log on all transactions:

2020-08-06 04:55:58.886774500 6137 (unrecognized_command) tls: TLS setup returning


It seems messagelabs everytime tries to "talk TLS" with the server but switches back to "non-TLS".
Is this a misguided interpretation?

2) The timeout problem goes both ways - receiving and also sending. Could be something about TLS but could also be a basic network connectivity issue unrelated to TLS?
Question: Can we disable TLS temporarily and observe? How? Is this a bad idea?

Code: [Select]
2020-08-07 09:25:27.419919500 delivery 5382: deferral: 67.219.246.206_failed_after_I_sent_the_message./Remote_host_said:_451_timeout_(#4.4.2)/STARTTLS_proto=TLSv1.2;_cipher=ECDHE-RSA-AES256-GCM-SHA384;_subject=/C=US/ST=California/L=Mountain_View/O=Symantec_Corporation/OU=Symantec.cloud/CN=mail414.messagelabs.com;_issuer=/C=US/O=DigiCert_Inc/CN=DigiCert_SHA2_Secure_Server_CA;/
2020-08-07 09:25:27.419923500 status: local 0/20 remote 9/20
2020-08-07 09:26:19.593111500 delivery 5176: deferral: Connected_to_67.219.246.206_but_connection_died._timed_out_(#4.4.2)/
2020-08-07 09:26:19.593114500 status: local 0/20 remote 8/20
2020-08-07 09:26:19.593114500 starting delivery 5406: msg 13371452 to remote aaa@b.com
2020-08-07 09:26:19.593115500 status: local 0/20 remote 9/20
2020-08-07 09:27:00.915908500 delivery 5398: deferral: 67.219.246.206_failed_after_I_sent_the_message./Remote_host_said:_451_timeout_(#4.4.2)/STARTTLS_proto=TLSv1.2;_cipher=ECDHE-RSA-AES256-GCM-SHA384;_subject=/C=US/ST=California/L=Mountain_View/O=Symantec_Corporation/OU=Symantec.cloud/CN=mail414.messagelabs.com;_issuer=/C=US/O=DigiCert_Inc/CN=DigiCert_SHA2_Secure_Server_CA;/
2020-08-07 09:27:00.915914500 status: local 0/20 remote 8/20
2020-08-07 09:27:00.940108500 delivery 5394: deferral: 67.219.246.206_failed_after_I_sent_the_message./Remote_host_said:_451_timeout_(#4.4.2)/STARTTLS_proto=TLSv1.2;_cipher=ECDHE-RSA-AES256-GCM-SHA384;_subject=/C=US/ST=California/L=Mountain_View/O=Symantec_Corporation/OU=Symantec.cloud/CN=mail414.messagelabs.com;_issuer=/C=US/O=DigiCert_Inc/CN=DigiCert_SHA2_Secure_Server_CA;/
2020-08-07 09:27:00.940113500 status: local 0/20 remote 7/20
2020-08-07 09:27:01.142764500 delivery 5393: deferral: 67.219.246.206_failed_after_I_sent_the_message./Remote_host_said:_451_timeout_(#4.4.2)/STARTTLS_proto=TLSv1.2;_cipher=ECDHE-RSA-AES256-GCM-SHA384;_subject=/C=US/ST=California/L=Mountain_View/O=Symantec_Corporation/OU=Symantec.cloud/CN=mail414.messagelabs.com;_issuer=/C=US/O=DigiCert_Inc/CN=DigiCert_SHA2_Secure_Server_CA;/
2020-08-07 09:27:01.142770500 status: local 0/20 remote 6/20
2020-08-07 09:27:01.145969500 delivery 5395: deferral: 67.219.246.206_failed_after_I_sent_the_message./Remote_host_said:_451_timeout_(#4.4.2)/STARTTLS_proto=TLSv1.2;_cipher=ECDHE-RSA-AES256-GCM-SHA384;_subject=/C=US/ST=California/L=Mountain_View/O=Symantec_Corporation/OU=Symantec.cloud/CN=mail414.messagelabs.com;_issuer=/C=US/O=DigiCert_Inc/CN=DigiCert_SHA2_Secure_Server_CA;/
2020-08-07 09:27:01.145974500 status: local 0/20 remote 5/20
2020-08-07 09:27:01.270157500 delivery 5399: deferral: 67.219.246.206_failed_after_I_sent_the_message./Remote_host_said:_451_timeout_(#4.4.2)/STARTTLS_proto=TLSv1.2;_cipher=ECDHE-RSA-AES256-GCM-SHA384;_subject=/C=US/ST=California/L=Mountain_View/O=Symantec_Corporation/OU=Symantec.cloud/CN=mail414.messagelabs.com;_issuer=/C=US/O=DigiCert_Inc/CN=DigiCert_SHA2_Secure_Server_CA;/

Quote
Hence my gut feeling that it is not SME related per se, but something the host is doing that is upsetting the apple cart independent of what SME is doing.

Otherwise you'd have more consistent results - eg it would be  failing when you had a load of mail at the same time or something.
 

Yeah, I guess we could monitor the network traffic see where it's getting dropped but, not well versed at all on TCP stuff, need to study.. :? Where's the best place to do a trace?

We checked the VM host, network setting is pass through. Although 3 guests are sharing 1 NIC, there doesn't seem to be congestion.

SME Server VM guest -> VMware host -> 2 network switches -> Fortigate firewall -> Internet -> Messagelabs

FIREWALL -> We've checked the firewall. It only has this related rule: allow all SMTP traffic on port 25 only from messagelabs cluster. And all the servers in the messagelabs cluster are defined properly.

However, here's a possible clue - a recent similar issue with Fortigate: https://www.reddit.com/r/fortinet/comments/i0xrgu/fg60e_641_smtp_starttls_connections_time_out/
The poster has this problem:

Quote
We have an Exchange server behind FG60E and some of the outbound SMTP connections time out after issuing a STARTTLS command (evidenced by a packet trace). The outbound firewall rule does not have email filter enabled and SSL inspection is set to certificate-only.

Furthermore, not all connections time out. For example, messages to Gmail go just fine, but connections to messagelabs.com don't (that also rules out certificate issues as Message Labs have good certs).

Plain text SMTP to the same hosts works fine - I've temporarily disabled TLS on the outbound Exchange connectors and all queued messages got sent immediately.

Will try to follow the suggestions on that thread first by next week.

3) We'll also open a ticket with Messagelabs just in case there is a problem on their end,
and get the logs posted here.

4) VM resource allocation
    Most probably not related to CPU and memory of the guest VM (did try increasing size, but just using <40% of memory and <20% of CPU at peak hour)
Also increased disk priority to high. Disk latency does burst up to 300ms at peak hour but goes back down to below 20. Would this have any effect?

5) Ciphers, etc.
If above investigation yields nothing, will go on with checking
https://maulwuff.de/research/ssl-debugging.html
Quote
FWIW see here for Cipher settings though I am not convinced they are the real issue:

https://wiki.contribs.org/SSL_Settings

Really appreciate the help much. Happy weekend!
« Last Edit: August 08, 2020, 05:16:44 AM by biogenic »

Offline mmccarn

  • *
  • 2,626
  • +10/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #6 on: August 08, 2020, 01:01:43 PM »
This TLS test tool from a messagelabs competitor might provide useful information: https://luxsci.com/smtp-tls-checker


Offline biogenic

  • 12
  • +0/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #7 on: August 11, 2020, 11:32:31 AM »
Quote
This TLS test tool from a messagelabs competitor might provide useful information: https://luxsci.com/smtp-tls-checker

Thanks for the tip, results seem good to me,
except for MX priority 20 results, is that cause for alarm?
 
Code: [Select]
cluster8.us.messagelabs.com
MX Priority: 10
SMTP TLS Enabled? Yes
TLS Certificate for: mail414.messagelabs.com
TLS Certificate Verifiable? Yes
SSL v3 Off? Yes
TLS v1.0 Supported? Yes
TLS v1.2 Supported? Yes
TLS v1.3 Supported? No
Weak Ciphers Off? Yes
NIST 800-52r2 Cipher Support? Yes
LuxSci SecureLine Compatability? Yes
SMTP MTA STS No


cluster8a.us.messagelabs.com
MX Priority: 20
Error: Connection failure.
-------------------------------------------------------------
In addition, for the firewall:
We have turned off all SSL inspection on the firewall, opened SMTPS on the firewall (previously only port 25).
However, the intermittent errors still occur with the same frequency.

We will check with Messagelabs next week.

Is there anything else we should check on our side or SME-server?
Thanks a lot!

Offline mmccarn

  • *
  • 2,626
  • +10/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #8 on: August 11, 2020, 12:47:45 PM »
Doh.

The test tool is only looking at your domain's MX records... which all point to messagelabs (obviously!) -- so it can't tell you anything about the SME server itself. Sorry for the red herring...

You would need a domain with an MX record pointing directly at the SME server for the luxsci tool to tell you anything useful.

To directly answer your question - no, I don't think a problem with messagelab's MX 20 server can be related to the issues you're having getting email from messagelabs delivered to your SME server.

Offline ReetP

  • *
  • 3,722
  • +5/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #9 on: August 11, 2020, 03:21:57 PM »
Couple of points here when eliminating things.

If we had lots of TLS/SSL receiving issues there'd be a pile of threads & bugs on it. But we don't.

So generally SME seems to be doing the right thing.

Next, message labs. I presume they send mail to lots of places...  we probably all send and receive messages via message lab servers without realising. So correctly confugured that seems OK.

So the answer most likely ends being somewhere inside your setup. Particular to your circumstances.

I also noticed this which you don't appear to have previously mentioned

Quote
2) The timeout problem goes both ways - receiving and also sending

Hmmmmm. You only mentioned receiving before??

Also:

Quote
Question: Can we disable TLS temporarily and observe? How? Is this a bad idea?

Disable it between message labs and SME indeed. Won't hurt as it is internal.

And remember, your encryption really guarantees nothing in terms of security because you can only control one 'hop'. After a mail goes off your site any upstream server could send it without an encrypted connection...... and that applies to received mail too.

The only real guarantee of security is pgp.

I'd go check and remove any unnecessary templates & settings just to eliminate issues.

Having seen the send failures I suspect an issue with message labs.

Try something like this to test the connection.

https://www.stevenrombauts.be/2018/12/test-smtp-with-telnet-or-openssl/
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation

Offline biogenic

  • 12
  • +0/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #10 on: August 20, 2020, 01:38:33 AM »
Solution below.
Thanks for everyone's help, it was really helpful!

About TLS, apparently we're not really using TLS with Messagelabs due to use of self-signed certificates... But, that was not really the problem.
------------------------------------------------------------
Here's the result of a test from Messagelabs to our SME Server.

>>> STARTTLS

FATAL:[50] STARTTLS: Certificate Verification Failed: The Certificate presented is not signed by a CA we trust.
FATAL:[50] STARTTLS: SSL connect attempt failed with unknown error error:14090086:SSL routines:ssl3_get_server_certificate:certificate verify failed
FATAL:[50] STARTTLS: Attempting to fetch raw cert and verify
FATAL[80] raw_connect_test failed. Error retrieving certificate: depth=0 C = --, ST = ----, L = Ottawa, O = XYZ Corporation, OU = Main, CN = mail.x.com, emailAddress = admin@x.com
verify error:num=18:self signed certificate
verify return:1

---------------------------------------------------------------------------------------------------
So the "TLS setup returning" probably does mean that it couldn't do TLS, and is reverting to non-TLS.


2020-08-06 04:55:15.814714500 6137 dispatching STARTTLS
2020-08-06 04:55:15.815018500 6137 220 Go ahead with TLS
2020-08-06 04:55:58.886774500 6137 (unrecognized_command) tls: TLS setup returning
2020-08-06 04:55:59.149296500 6137 dispatching EHLO mail6.bemta23.messagelabs.com

----------------------------------

Anyway, we were able to find a solution on a hunch.
It was not related to TLS, but rather network path MTU.

We ran the following commands and it totally fixed the problem, not one instance of the error remains.

Code: [Select]

tracepath messagelabsserverclustername.com

Then the result was there was one hop somewhere after our ISP with lower MTU = 1496
compared to SME server's 1500.

Code: [Select]
config setprop InternalInterface MTU 1496

signal-event post-upgrade; signal-event reboot




So apparently, it was indeed a kind of timeout issue.

As found here:
https://wiki.contribs.org/SME_Server:Documentation:FAQ:Section04#qpsmtpd_.22Connection_Timed_Out.22_errors

qpsmtpd "Connection Timed Out" errors

See Bugzilla:6888 and Bugzilla:2360

A qpsmtpd timeout error may arise, this is not an issue that is caused by SME server directly, however it can become an issue depending on hardware and configuration settings that are contained in and around other enviroments.

It is discussed under various names

    Path MTU Discovery Blackhole http://www.phildev.net/mss/mss-talk.pdf
    Path MTU Discovery Failures http://www.wand.net.nz/~mluckie/pubs/debugging-pmtud.imc2005.pdf
    TCP Problems with Path MTU Discovery http://www.ietf.org/rfc/rfc2923.txt

As discussed in Bugzilla:6888 a workaround was found that may help in mitigating the issue.

The tracepath utility (included with SME 8.0 and SME 7.6) can be used to locate non-standard MTU values between your SME server and any remote host.


Offline TerryF

  • grumpy old man
  • *
  • 1,821
  • +6/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #11 on: August 20, 2020, 02:05:46 AM »
Nice find, one of those insects that can bug you to total distraction and no amount of insect spray, poisons etc get rid of them :-) can effect ones health :-)
--
qui scribit bis legit

Offline ReetP

  • *
  • 3,722
  • +5/-0
Re: TLS failed: Could not create SSL socket -intermittent
« Reply #12 on: September 04, 2020, 04:27:35 PM »
Solution below.
Thanks for everyone's help, it was really helpful!

Always nice to see someone get a resolution!

Quote
About TLS, apparently we're not really using TLS with Messagelabs due to use of self-signed certificates... But, that was not really the problem.
..........
It was not related to TLS, but rather network path MTU.

Nice one!

I think that SSL/TLS is pretty sensitive to any forms of interruption. I do remember seeing this somewhere before.
...
1. Read the Manual
2. Read the Wiki
3. Don't ask for support on Unsupported versions of software
4. I have a job, wife, and kids and do this in my spare time. If you want something fixed, please help.

Bugs are easier than you think: http://wiki.contribs.org/Bugzilla_Help

If you love SME and don't want to lose it, join in: http://wiki.contribs.org/Koozali_Foundation