Saturday, February 25, 2012

Messages stuck in sys.transmission_queue

Hi,

At my company, we're trying to use service broker to create a client-server system where there is a head office machine and multiple outlets registered with that head office. My problem is that sometimes when a branch sends a message to the head office, it just seems to sit in the transmission queue and never gets sent. If I run a script that forcibly ends the conversations on the client machine (with cleanup), storing the message bodies and then resend them, they seem to get through fine.

The way that we send messages is by calling a t-sql stored procedure from a c# application using SqlCommand (don't know if this should make any difference).

If I monitor the Head Office machine and one of the Outlets while this is happening, on the HO I get three events in a row:

Broker: Message Classify (1 - Local) Audit Broker Conversation (2 - No Certificate) Broker: Message Undeliverable (1 - Sequenced Message)
The TextData contained in the third event is: This message could not be delivered because the security context could not be retrieved.

The RoleName of the server is Initiator, and the TargetUserName is the name of the service on the Outlet.

On the Outlet I get the following event repeatedly (presumably as it continues to try sending the message) - Broker: Remote Message Acknowledgement (1 - Message With Acknowledgement Sent).

On the client the RoleName also appears to be Initiator, and the TargetUserName is blank.

This would make me suspect that certificates were missing or something, except that if I remove messages from the queue and resend them they seem to get through, and also I've checked both databases and they have the correct certificates.

Any ideas?

Thanks in advance,

AdamThanks for the detailed info, it is always helpfull to have the full details when trying to diagnose an issue.
The most likely culprit in such cases is an account that has more certificates to be picked from when encrypting/signing a message. Whenever SSB has more than one certificates 'valid' to be used for a particular database principal, it will pick the one with the latest expiration date. So sometimes even though the configuration 'seems' correct, SSB will pick the 'wrong' certificate simply because it has multiple choices. The typical account prone to this problems is 'dbo', as certs are being created for dbo for various reasons and by default they are all valid for SSB to pick.
To confirm this problem, the 2nd event in your post (the Audit Broker Conversation one) actually tells which certificate was not found: it has the issuer name and serial number of the cert 'not found' on two of the columns (I can't remember now which ones and the BOL 'ommits' this detail). You can then look into the message sender's database to see why that particular cert was picked.
Certs can actualy be controled whether they are available for SSB to 'pick' by turning off the ACTIVE_FOR_BEGIN_DIALOG option on them (the option refers to certs used for the reverse path from target to initiator as well, despite the name 'begin_dalog'...)|||Thanks for the reply. What you're saying makes sense, and seems possible to me (we re-registered one of the outlets recently which could have caused this). The only problem is that the Audit Broker Conversation event doesn't appear to have the issuer or serial number in any of the columns. Maybe I'm using profiler wrong or something, but I would have thought that just selecting the event would have been enough. Any idea what I'm doing wrong?

Adam

|||

IssuerName is DBUserName and SerialNumber is in TargetLoginName.

Make sure the columns above are selected when defining the trace.

|||I had worked out that the IssuerName was DBUserName, but hadn't noticed the serial number. Thanks for explaining that. The thing is that the IssuerName isn't much use to me because the way I generated the certificates, they all have the same name Smile.

But anyhow, I found a solution to the problem. Basically we had an old database on the same server that had broker enabled as well, and I'm guessing that since they both have the same service names in them, that was causing the problems. I disabled service broker on the old db and everything seems to be working again. The only thing I'm worried about now is how the broker became enabled on the database - but that's not really a service broker issue.

Thanks for the help.

Adam
|||

Whenever SSB finds multiple instances of the same service it assumes a load balancing scenario and will hash the incomming conversations to the available service instances. Specifying a broker_instance in the begin dialog from the initiator would make the application stick to a specific service (database), even if more instances of the service show up on the target.

No comments:

Post a Comment