Quantcast
Viewing all articles
Browse latest Browse all 149

vRA7 EBS Times Out Even When Workflows Succeed

We had an issue today where the vRealize Automation (vRA) 7 Event Broker Service (EBS) would time out. The timeouts would happen intermittently, during different stages of the provisioning lifecycle. We noticed that something was not right when extensibility workflow calls to vRealize Orchestrator (vRO) would return after the vRO workflows completed, but the provisioning lifecycle state for the virtual machine would fail to change or progress and eventually time out with an EBS timeout message.

While doing some investigation work, I found that for some reason, the RabbitMQ configuration in this distributed HA vRA deployment did not look right. From the vRA Cafe appliance VAMI portal, I could see that both appliances only showed the local RabbitMQ instance as a cluster node. Issuing the following command on the command line of each vRA appliance confirmed my suspicions. The RabbitMQ configuration was not clustered: (Server names have been changed for this post)

vranode1:~ # rabbitmqctl cluster_status
Cluster status of node rabbit@vranode1 ...
[{nodes,[{disc,[rabbit@vranode1]}]},
{running_nodes,[rabbit@vranode1]},
{cluster_name,<<"rabbit@vranode1.domain">>},
{partitions,[]},
{alarms,[{rabbit@vranode1,[]}]}]

vranode2:~ # rabbitmqctl cluster_status
Cluster status of node rabbit@vranode2 ...
[{nodes,[{disc,[rabbit@vranode2]}]},
{running_nodes,[rabbit@vranode2]},
{cluster_name,<<"rabbit@vranode2.domain">>},
{partitions,[]},
{alarms,[{rabbit@vranode2,[]}]}]

For the EBS to function correctly, it needs to have access to the RabbitMQ queues. If an event is placed in a queue on appliance 1 and appliance 2 is waiting to process a task related to that event but waiting for a notification from the queue; the task times out after a default period of 30 minutes if the message never makes it to the queue on appliance 2. If the RabbitMQ instances between the vRA nodes are not clustered, messages in RabbitMQ queues on one appliance will not be visible to the other appliance.

To fix the issue, take snapshots of the vRA appliances, log into vranode2 via SSH and issue the following commands to form a RabbitMQ cluster:

rabbitmqctl stop_app
rabbitmqctl join_cluster vranode1
rabbitmqctl start_app 

Then, check cluster configuration again to confirm that the cluster was formed successfully:

vranode1:/etc/rabbitmq # rabbitmqctl cluster_status
Cluster status of node rabbit@vranode1 ...
[{nodes,[{disc,[rabbit@vranode1,rabbit@vranode2]}]},
{running_nodes,[rabbit@vranode2,rabbit@vranode1]},
{cluster_name,<<"rabbit@vranode1.domain">>},
{partitions,[]},
{alarms,[{rabbit@vranode2,[]},{rabbit@vranode1,[]}]}]

With RabbitMQ now in a clustered configuration, the vRA EBS should work as expected.


Viewing all articles
Browse latest Browse all 149

Trending Articles