Showing posts with label stuck. Show all posts
Showing posts with label stuck. Show all posts

Saturday, February 25, 2012

messages stuck in sys.transmission_queue

Sorry for the stupid question, but I can't seem to figure it out...

There are 119 messages that are stuck in the transmission queue, all for the same queue. When I check the status of the queue (via sys.service_queues), is_receive_enabled = 1, is_activation_enabled = 1, and max_readers = 3. When I check to see if there is an active queue monitor (via sys.dm_broker_queue_monitors) there is nobody watching this queue. What would cause this queue to be active, enabled but have nobody montioring it? Is there something internal that went wrong that made these (outbound) messages get stuck in the transmission queue, and is not showing up in the views? How can I get these messages "un-stuck" and flow through the system?

A problem I am seeing is the return message to this queue (to signify the target has consumed the message, and to end the conversation) are not being consumed, thus getting stuck in the "DI" state.

Any suggestions would be greatly appreciated.

Thanks in advance,

John Hennesey

p.s. The transmission status is blank for all 119 rows.|||

Are all messages in transmission queue on the same dialog or on different dialogs?
Are the messages remote or local (if local, is it same instance or same db) ?
Attaching the profiler and monitoring for events in the Broker category shows any activity?
Is there any route for the messages destination service in the database?
Do new messages sent end up in the same situation or they get trough fine?

To trigger an internal 'recycle' of the whole broker 'machinery' you can disable and then enable back the broker (ALTER DATABASE ... SET DISABLE_BROKER / ENABLE_BROKER), but I'd like you to try the profiler first to confirm if there realy isn't any broker activity for those messages.

HTH,
~ Remus

|||

Remus - thank you so much for your quick response. I was hoping you would see this... :) To answer your questions:

Are all messages in transmission queue on the same dialog or on different dialogs?

When you say dialog, do you mean the same conversation? The conversation handles are indeed different.|||

I am puzzled myself. My recommendation would be to focus on one individual conversation that exposes the problem. Lookup the conversation_handle in sys.transmission_queue for any of the messages stuck (pick one). Starthing from this, investigate as follows:
- find the corresponding conversation (that owns the message) in sys.conversation_handles
- using the conversation_id, find the peer conversation handle (initiator and target both share the same conversation_id value)
- what states are the two conversation endpoints found in the previous step? For messages to travel, both should be in CONVERSING state.
- check if there are discrepancies between the send_sequence and receive_sequence between the two conversation endpoints. In each direction (initiator to target and target to initiator) there should be a contigous sequence of message numbers: i.e. if initiator's send_sequence is 10 and target's receive_sequence is 5, the messages numbered 5,6..10 should all be in the transmission_queue. See if you can spot any gap (e.g. send_sequence 10, peer's receive_sequence is 9, but message 10 is missing from transmission_queue) or overlap (e.g. send_sequence 10, receive_sequence also 10 but message 10 was not yet deleted from transmission_queue)
- monitor again the Profiler broker events, but filter the events only for the conversation you're focusing on. The meaning of each column displayed in the Profiler for broker events is documented here: http://msdn2.microsoft.com/en-us/library/ms186347.aspx and you can filter based on a given conversation_id (the one you're focusing on).

HTH,
~ Remus

|||

Remus - thank you very much for the response. Today we had to deactivate our queues (for maintenance purposes), and I jumped at the chance to also deactivate the queue in question. Turns out we have a script that deactivates the queues every morning, kicks off a cube processing event (so it will be built on static data), then reactivates the queues. I didn't know it, but this queue is not part of that script. I think this script was the culprit - somehow when things were processing, it severed something, somewhere.

It made it look like the queue was enabled and active, all signs pointed to everything alive and well, but nothing was monitoring the queue (via sys.dm_broker_queue_monitors). We deactivated the queue, disabled the queue, enabled the queue and reactivated the queue and everything started flowing smoothly. Queue counts are going down, our conversations are closing properly and the sys.conversation_endpoints view is showing CD conversations being purged after the 30 minute period. Awesome!

I still don't know exactly what caused it, nor did I get the chance to really dive into your info from the last post, but some of the thing you mentioned got me thinking down this path. I really appreciate your help!

If there are any questions you have from me, please do not hesitate to ask - i.e. if the product group is interested in some of the processes we do that may have caused the queue to be in this state. Otherwise, I will mark this as answer.

Thanks again - have a great weekend.

John

|||

My recommendation would be to describe this problem at http://connect.microsoft.com/SQLServer/Feedback

Thanks,
~ Remus

Messages stuck in sys.transmission_queue

Hi,

At my company, we're trying to use service broker to create a client-server system where there is a head office machine and multiple outlets registered with that head office. My problem is that sometimes when a branch sends a message to the head office, it just seems to sit in the transmission queue and never gets sent. If I run a script that forcibly ends the conversations on the client machine (with cleanup), storing the message bodies and then resend them, they seem to get through fine.

The way that we send messages is by calling a t-sql stored procedure from a c# application using SqlCommand (don't know if this should make any difference).

If I monitor the Head Office machine and one of the Outlets while this is happening, on the HO I get three events in a row:

Broker: Message Classify (1 - Local) Audit Broker Conversation (2 - No Certificate) Broker: Message Undeliverable (1 - Sequenced Message)
The TextData contained in the third event is: This message could not be delivered because the security context could not be retrieved.

The RoleName of the server is Initiator, and the TargetUserName is the name of the service on the Outlet.

On the Outlet I get the following event repeatedly (presumably as it continues to try sending the message) - Broker: Remote Message Acknowledgement (1 - Message With Acknowledgement Sent).

On the client the RoleName also appears to be Initiator, and the TargetUserName is blank.

This would make me suspect that certificates were missing or something, except that if I remove messages from the queue and resend them they seem to get through, and also I've checked both databases and they have the correct certificates.

Any ideas?

Thanks in advance,

AdamThanks for the detailed info, it is always helpfull to have the full details when trying to diagnose an issue.
The most likely culprit in such cases is an account that has more certificates to be picked from when encrypting/signing a message. Whenever SSB has more than one certificates 'valid' to be used for a particular database principal, it will pick the one with the latest expiration date. So sometimes even though the configuration 'seems' correct, SSB will pick the 'wrong' certificate simply because it has multiple choices. The typical account prone to this problems is 'dbo', as certs are being created for dbo for various reasons and by default they are all valid for SSB to pick.
To confirm this problem, the 2nd event in your post (the Audit Broker Conversation one) actually tells which certificate was not found: it has the issuer name and serial number of the cert 'not found' on two of the columns (I can't remember now which ones and the BOL 'ommits' this detail). You can then look into the message sender's database to see why that particular cert was picked.
Certs can actualy be controled whether they are available for SSB to 'pick' by turning off the ACTIVE_FOR_BEGIN_DIALOG option on them (the option refers to certs used for the reverse path from target to initiator as well, despite the name 'begin_dalog'...)|||Thanks for the reply. What you're saying makes sense, and seems possible to me (we re-registered one of the outlets recently which could have caused this). The only problem is that the Audit Broker Conversation event doesn't appear to have the issuer or serial number in any of the columns. Maybe I'm using profiler wrong or something, but I would have thought that just selecting the event would have been enough. Any idea what I'm doing wrong?

Adam

|||

IssuerName is DBUserName and SerialNumber is in TargetLoginName.

Make sure the columns above are selected when defining the trace.

|||I had worked out that the IssuerName was DBUserName, but hadn't noticed the serial number. Thanks for explaining that. The thing is that the IssuerName isn't much use to me because the way I generated the certificates, they all have the same name Smile.

But anyhow, I found a solution to the problem. Basically we had an old database on the same server that had broker enabled as well, and I'm guessing that since they both have the same service names in them, that was causing the problems. I disabled service broker on the old db and everything seems to be working again. The only thing I'm worried about now is how the broker became enabled on the database - but that's not really a service broker issue.

Thanks for the help.

Adam
|||

Whenever SSB finds multiple instances of the same service it assumes a load balancing scenario and will hash the incomming conversations to the available service instances. Specifying a broker_instance in the begin dialog from the initiator would make the application stick to a specific service (database), even if more instances of the service show up on the target.