I want to start out by apologizing to anyone who was inconvenienced by the plugin endpoint issues we experienced last night. I know it can be really annoying when a system that just works for years has an unexpected anomaly causing confusion and frustration. I am so sorry.
First let’s talk about the actual issues that user were experiencing. There are two of them:
1 The plugins were flagging licenses as invalid and displaying warnings in the WordPress admin. On a basic level, the PRO plugins each have a license system which allows the plugin update for customers while protecting against attackers downloading the plugins. When the plugin tries to update without a valid license, an error is returned and a message is provided to the user to let them know there is a license issue and the plugin can’t update automatically.
Last night for a period of about 8 hours, every user who is using the legacy license system was getting an invalid license response and seeing the message.
2 The Go Live Update URLs PRO plugin downloads were missing files and thus creating fatal errors. On a basic level, the PRO plugins each have and endpoint which builds the .zip of the plugin and prepares it for download. During updates, WordPress downloads the .zip and installs it as is.
Last night for a period of about 8 hours the .zip for this plugin was missing files and anyone who tried to install or update it was receiving fatal errors.
The cause of the issue:
For all of our sites, including this one, we use a DevOps workflow for deployments. We complete micro-sprints and deploy changes very frequently. This week we have been deploying to this site daily.
Part of our deployment process uses Continuous Integration to automatically build libraries and distributions of code. I won’t get too technical here but to understand the cause it’s important to know this process is automated. It process almost always (99%) of the time, completes without issue and is typically not something we are concerned about.
For some of our libraries we are using a tool call Composer to update and manage some code. Here is where the process failed. During Wednesday’s deployment, Composer failed to install a required update on one server. It wasn’t a fatal error so process continued and we were not notified. This caused a critical file to be missing from the plugin endpoint.
We realized this file was missing and fixed the issue fairly quickly (within 10 minutes actually). But there was a side-effect we didn’t account for.
Because we get something like 100 requests a minute to our plugin update endpoint, we have a rudimentary cache setup to respond to repeating requests and limit the traffic to one server.
During the 10 minutes that the critical file was missing, all requests using the legacy license endpoint were be returned as invalid because the endpoint received a fatal error when retrieving the download URL. This invalid response was then cached in the endpoint and returned to all subsequent requests.
By this time it was 9 pm here and we went home for the night, thinking everything was back to normal. Not realizing the failure was still returning from the cache. While the cache only lasts about 24 hours before it would cleared itself and been fixed, 8 hours was long enough for about 2% of our users to be affected.
Where are things now
All caches have been cleared and all issues have been resolved. We have been monitoring everything for the past couple hours just to be sure, and everything appears to be back to normal.
Users are still most likely seeing the “Invalid License” message because WordPress caches plugin update requests. Any sites which made an update request during that 8 hour period, may take a little while before their site makes another request and gets a valid response.
What we are doing to prevent this in the future
We already have a strong automated testing process setup with our workflow so it wasn’t too much trouble to add a test for this issue against the production endpoint. Now any time we commit code it will automatically check for this highly unlikely issue regression.
Side note about licenses
You have heard me refer to “legacy license” in this post. Previously all plugins used to only support permanent licenses. On May 26th a new license system was introduced with allows both subscriptions and permanent licenses.
The important thing to note here is the new system has an all new endpoint and response process. To use the new system, you must have downloaded the plugin from either the original purchase email or My Account on or after May 26th, 2019. If you haven’t re-downloaded since then, you are on the legacy system.
We are currently in the process of phasing out the legacy system completely. We expect to have it be fully discontinued by end of 2019. If you haven’t yet, it’s not a bad idea to head over to My Account and start updating your site(s) with a fresh install of your plugins.