I was watching this funny Brent Ozar video about what's new in SQL Server 2019 when I realized "Okay, we are doing the same!".
The video explains how Microsoft, once it started hosting SQL databases in Azure, realized how awful queries are run by the users. They got stats and found out that the SQL databases are hit by a swarm of non-optimal queries with strange parameters and strange uses.
Now, all of this happens to us too!
When we design the system, we anticipate some kind of "normal" usage. But the reality is that users are using and abusing things in all possible ways. We do test everything in a lab environment, but when the actual users hit their keyboards, strange things happen. Unpredictable things. Sometimes it is totally our failure - we should have guessed that! Sometimes, you just can't guess what the users would do.
The good things is that, by using our hosted services, we do get stats. Stats are generalized - they don't contain anything specific about the actual usage. What they do tell us is WHERE there is a problem. For example - the "Calculate ATP" proc gets hit way more than expected. Or perhaps it takes more than the expected 50-100 ms (remember, it is a real-time ATP calculation, which is done by almost no other ERP system, so 50ms is pretty good).
Back to Microsoft story. They realized that instead of trying to teach every and each user how to format their queries to perform better, they can just *optimize* the way certain bad queries run.
Now, if I compare this to us: We actually *do* try to teach everybody how to make their queries (or most often - their calculated attribute formulas) run better. We do create back-tickets to the users telling them that this or that is just not done in the best way. But this is not always possible. It takes time, sometimes the person who created (for example) the calculated attribute is no longer around, or there is nobody who has interest to invest time, etc., etc.
At the end of the day, there are just so many queries that do hit our servers and take precious CPU time. And CPU cycles at the SQL machine are the most precious resource out there in the data center. After all, we do pay SQL licenses by CPU cores. But we often license them by user to the end users. So, whatever the user runs, if it takes a lot of CPU time, it is at our expense.
So, now we have common goal. The users wants their queries to run faster. We want to save our precious SQL server from taking any time at all, so that we pay less licenses. What we do is that we take those generalized statistics and try to figure out why the top CPU-eating queries are so slow. In doing so, we often end up finding a way to optimize them.
The optimizations come in different forms and shapes. The query might be a very fast query, but it can be run many times. Or it can be a slow query, run only a few times. We look at
Total Time = Average Time * Number Of Executions
So, if a query is a slow one, and we optimize it, the user might suddenly notice "Wow! My query now just runs faster! How nice!".
However, if we optimize a query, which is run often, but each execution is usually pretty fast, the end user might not even notice! Lets say that every time a user is doing this operation, he saves 1 second (out of, lets say 10). He/she might not even notice the "optimization". However, it accumulates. When many users are doing this over and over, savings can be huge. And when the users save time, this lowers the cost of operation of the whole company.
So, every company that uses our servers in strange and new ways helps the product to become just a little bit more optimized. And in time, things get better and better.
NOTE: Such things can be done for on-premise too. Some stats can get collected and even generalized. Some cannot - just because they are not collected by the software itself, but by the data center setup. Also, it is almost impossible to compare all the different environments the users can use to run the software. Also, we do know that there no configuration errors, that all services are online, etc. So, doing this in a standardized DC environment is pretty straight forward, while trying to make sense of all the server environments users can setup can quickly get messy.
0 Comments