Saturday, February 25, 2012

messages stuck in sys.transmission_queue

Sorry for the stupid question, but I can't seem to figure it out...

There are 119 messages that are stuck in the transmission queue, all for the same queue. When I check the status of the queue (via sys.service_queues), is_receive_enabled = 1, is_activation_enabled = 1, and max_readers = 3. When I check to see if there is an active queue monitor (via sys.dm_broker_queue_monitors) there is nobody watching this queue. What would cause this queue to be active, enabled but have nobody montioring it? Is there something internal that went wrong that made these (outbound) messages get stuck in the transmission queue, and is not showing up in the views? How can I get these messages "un-stuck" and flow through the system?

A problem I am seeing is the return message to this queue (to signify the target has consumed the message, and to end the conversation) are not being consumed, thus getting stuck in the "DI" state.

Any suggestions would be greatly appreciated.

Thanks in advance,

John Hennesey

p.s. The transmission status is blank for all 119 rows.|||

Are all messages in transmission queue on the same dialog or on different dialogs?
Are the messages remote or local (if local, is it same instance or same db) ?
Attaching the profiler and monitoring for events in the Broker category shows any activity?
Is there any route for the messages destination service in the database?
Do new messages sent end up in the same situation or they get trough fine?

To trigger an internal 'recycle' of the whole broker 'machinery' you can disable and then enable back the broker (ALTER DATABASE ... SET DISABLE_BROKER / ENABLE_BROKER), but I'd like you to try the profiler first to confirm if there realy isn't any broker activity for those messages.

HTH,
~ Remus

|||

Remus - thank you so much for your quick response. I was hoping you would see this... :) To answer your questions:

Are all messages in transmission queue on the same dialog or on different dialogs?

When you say dialog, do you mean the same conversation? The conversation handles are indeed different.|||

I am puzzled myself. My recommendation would be to focus on one individual conversation that exposes the problem. Lookup the conversation_handle in sys.transmission_queue for any of the messages stuck (pick one). Starthing from this, investigate as follows:
- find the corresponding conversation (that owns the message) in sys.conversation_handles
- using the conversation_id, find the peer conversation handle (initiator and target both share the same conversation_id value)
- what states are the two conversation endpoints found in the previous step? For messages to travel, both should be in CONVERSING state.
- check if there are discrepancies between the send_sequence and receive_sequence between the two conversation endpoints. In each direction (initiator to target and target to initiator) there should be a contigous sequence of message numbers: i.e. if initiator's send_sequence is 10 and target's receive_sequence is 5, the messages numbered 5,6..10 should all be in the transmission_queue. See if you can spot any gap (e.g. send_sequence 10, peer's receive_sequence is 9, but message 10 is missing from transmission_queue) or overlap (e.g. send_sequence 10, receive_sequence also 10 but message 10 was not yet deleted from transmission_queue)
- monitor again the Profiler broker events, but filter the events only for the conversation you're focusing on. The meaning of each column displayed in the Profiler for broker events is documented here: http://msdn2.microsoft.com/en-us/library/ms186347.aspx and you can filter based on a given conversation_id (the one you're focusing on).

HTH,
~ Remus

|||

Remus - thank you very much for the response. Today we had to deactivate our queues (for maintenance purposes), and I jumped at the chance to also deactivate the queue in question. Turns out we have a script that deactivates the queues every morning, kicks off a cube processing event (so it will be built on static data), then reactivates the queues. I didn't know it, but this queue is not part of that script. I think this script was the culprit - somehow when things were processing, it severed something, somewhere.

It made it look like the queue was enabled and active, all signs pointed to everything alive and well, but nothing was monitoring the queue (via sys.dm_broker_queue_monitors). We deactivated the queue, disabled the queue, enabled the queue and reactivated the queue and everything started flowing smoothly. Queue counts are going down, our conversations are closing properly and the sys.conversation_endpoints view is showing CD conversations being purged after the 30 minute period. Awesome!

I still don't know exactly what caused it, nor did I get the chance to really dive into your info from the last post, but some of the thing you mentioned got me thinking down this path. I really appreciate your help!

If there are any questions you have from me, please do not hesitate to ask - i.e. if the product group is interested in some of the processes we do that may have caused the queue to be in this state. Otherwise, I will mark this as answer.

Thanks again - have a great weekend.

John

|||

My recommendation would be to describe this problem at http://connect.microsoft.com/SQLServer/Feedback

Thanks,
~ Remus

No comments:

Post a Comment