TLD CRM BoostPatch 10.2.0 – Queryless Ping

TLDialer – Queryless Ping

Third Party Vendors are consuming our Dialer Ready API ( Agent Availbility Ping ) at alarming rates. We are seing massive spikes especially around 11 – 2 PM EST as well as around 4 – 6 PM EST. Alot of this has to do with inefficiencies in RTB ( Real-Time Bidding ) systems pinging multiple times for the same Call, Clients and Vendors setting their Ping Retries to ridiculous rates ( like 10+ retries back to back on failures, or unwanted results ) as well as vast amounts of misconfigured pings that don’t even really do anything or perform well at all. (Note: We are attempting to change / fix that in an upcoming Vendor Rework). We even have some inactive clients that still get pinged even though they don’t operate anymore!

Nearly 2/3 of all TLD Traffic is to the Availability API and it keeps growing.

What Happened Yesterday – 12/13/2024?

Yesterday we had a massive spike in Dialer Ready traffic that did not subside which overloaded our MemoryStore used for Caching and Processing these requests. As the MemoryStore filled up it started throwing out cache Keys and missing cache hits, which increased Latency across the entire system. The errors and timeouts that were occuring only made the problems worse as external vendors and call routers slammed us even further due to their excessive retry policies on their call router configurations, which just slowed everything to a crawl ( Some were set to Retry 11+ times ). While the system didn’t go down completely, there was intermittent connectivity issues, problems with real time updates, etc.

We realized what the issue was pretty quickly as it was clearly the MemoryStore timing out. We first tried to investigate where the traffic was coming from to notify the vendors that they may have a misconfiguration, but even after identifying some and contacting them it did not subside. We then decided we needed to upgrade the MemoryStore in place to handle the spike in traffic, that is where the Delay came from.

AWS took Far too long to Upgrade the MemoryStore, after waiting and waiting eventually the only solution was to spin up an entire new MemoryStore and reconfigure the CRM and Backend Systems to use it. Once we made the switch all problems subsided. (The MemoryStore Upgrade completed nearly 5 hours later after switching to the new one…)

After this event we worked tirelessly throughout the night to finalize and release a refactored Dialer Ready API that has been in development that functions via a “query-less” availability check.

What does Query-less mean?

It means that instead of Querying the Dialer Live Agents Database Table ( which is very sensitive table under load ), it’s instead using a Real-Time, Up to the Second Memory Store lookup which then parses the data in the same fashion and with the same results the database would show, but is faster, more effiecient and more improtantly allows ut to avoid querying the database and caching results for every ping. What this let’s us do is offload the Dialer Ready Ping ( And in the future, other internal queries ) completely off the database. We now update this Memory Store every second by hooking into the TLDialer Primary Emitter Process we are already running which effectively costs us no extra queries. The best part about it, is these queries are and always have been a static set of 14 queries so regardless of how many agents are online, or how many vendors exist, the amount of queries never changes since they are built in a very efficient manner ( This has been the core of how TLDialer works since it’s inception ).

Results

Being released for only a few hours, we have already seen ping responses cut down in half from 50ms average to 20ms average. This also offloads alot of heavy caching that the MemoryDatabase was doing in conjunction with Querying the Dialer. We will be monitoring during the days to come to compare performance with prior weeks and days.

What else did we do?

We beefed up the MemoryStore by x3, which is now performing swimmingly. We will be monitoring the MemoryStore performance in the days to come. We also have some further optimizations for auto scaling the MemoryStore that we are working on to be able to adjust for spikes like this in real time without any downtime. Both AWS ECS (The Platform the CRM Software Runs On) and AWS RDS (The Database) fortunately performed exceptionally well during this event today and we are happy with our scalability adjustments for those parts of the system that we have implemented over the past year.

Any Other Fixes or Improvements from this Change?

  • Improved Ping Response Time
  • Reduced Cache Keys Required for Vendor Queries
  • Improved Primary Dialer Database Insert, Update, Select Performance due to Offloaded Queries
  • Reduced Ability for External Call Routers to cause Performance Issues
  • Always Real Time Results since we are plugging into the Same Data Agents use for Daily Operations.
  • Fixed some Possible False Positive Results for Agents in DEAD state.

The Future of Query-less TLD

We will be monitoring the results of this update and if it performs reliably we will be making the following updates

  • Internal CRM Agent State Checks for Users will be Updated to be Query-less.
  • Live Agents Will be Adjusted to be Query-less.
    • We may even be able make this auto-update in real time without the refresh button.
  • Call in Queue List Checks will be Adjusted to be Query-less.
  • New Ingroup Availability Overview/Dashboard will be Exposed.
  • More Server Performance Stats on “Show Load Balance” in Live Agents
  • Mass Edit / Refresh Live Agents Refactor to be Query-less outside of Updates.

More Updates to Come!

  • Vendor Rework – The Big One we have been Working On
    • New Vendor Sources / Groups Tables with Saveable Filters
    • New Vendor Based API Ping ( That Requires Authentication ) for better tracking and control
      • Includes Customer Controllable Dialer Ready Parameter Settings, no more telling vendors to update the pings!
    • Robust Vendor Changelogs
    • Cleaner Vendor Edit Panels
    • At a Glance Better Insights on Vendor Configurations
    • Vendor Call Sales Attribution
    • Policy Vendor Attribution and Reports Adjustments
    • TLDialer Call Logs Call Sale / Policy Attribution
    • Updated AVR ( Advanced Vendor Report ) to work with TLDialer Call Logs and Call Cost Attribution.
    • Retreaver Integrations and Optimizations
    • Ringba Integrations and Optimizations

Stay Tuned!