# Sparky learning SQL



## mburtis (Sep 1, 2018)

I'm starting to think I have a real sickness, I can't seem to limit myself in the topics I want to learn about. Even if I don't really want to know about it. In no way do I want to end up a full time programmer or integrator or IT, but here I am dabbling in plc and hmi programming, then today I installed and started playing with mySQL program. 
Seems SQL based databases are becoming more and more common for storing and organizing plc data. I hate computers but I hate outside d-bag IT guys more. Can I still call myself an electrician as long as I point at stuff with screwdrivers?


----------



## joab (Dec 28, 2019)

I used to really love writing SQL. I haven’t done any for a while, but something about writing queries just clicked for me.


----------



## paulengr (Oct 8, 2017)

First off look at W3Schools for the basics. You can learn SQL in about an hour or two there.

Second the basics is you send it a text “query”…really a lot like a program. It responds back with formatted data. Find a small SQL query program so you can see what is going on.

Third you are dealing with a language from the 1970s. The “standard” wild card symbols * and ? that Microsoft copied from Unix weren’t universal yet so you get different symbols and names for things that you don’t expect. Today’s languages often have a lot in common so you can pretty much “read” even languages you have never seen.

Fourth SQL is about 180 degrees from every other typical computer language. In a typical language you tell if step by step what to do. With a lot of dads you put the steps in a loop or multiple loops. In SQL you sort of start with “everything” and then filter and slice it up until you get just what you want out of the data. You don’t specify the steps and if you do it will be SLOW. The query is fed to a compiler that best decides how to do it. If you ask for the same thing it has run before it just returns cached results. If there is an index on the table that it can use the query comes back very fast. What you never want to happen us for it to actually do a loop looking for results, especially a nested loop (SLOW). SQL is so different from other languages I find myself having to switch gears. It’s like learning say Spanish. It works great once you can “think” in Spanish but translating back and forth is a mess.

Fifth SQL makes tables or more accurately sets of data available. Good for searching data but as far as presentation goes, not so good. That’s where a reporting package comes in.

Finally the implementations in HMI systems in data historians are total crap. You are better off using a real SQL database in most cases. The worst one by far is Wonderware. To begin with it’s nothing like a real SQL database. You actually query a Microsoft SQL database. Then you use a sort of “raw query” interface to connect to a “remote” database passing the query through as raw text. The “remote” database returns a SQL table back to Microsoft and eventually back to you. You can do some processing (looping) only in the Microsoft side of things. The only queries the Wonderware garbage supports are effectively trend chart data and that’s it. Don’t expect averages, totals, searches, min/max, or being able to group or summarize things by say hour/day/month. That’s what a real SQL database does. To be fair PI does some of this. They do it by using a “calculation engine” that does queries in the background preprocessing the data so that you can query the results.

The reason for this historian nonsense is the claim that SQL is “slow”. To be fair 10 years ago SQL maxed out at about 250 data points per second write speeds. It has only gotten better. The early versions of the Ignition HMI proved SQL is plenty fast. Originally that was the entire system. SQL was the tag database. The only choice you had was to edit the table in place (real time data) or to log historical values. The concept behind a historian is to be able to just “log everything”. So 10,000 data points per second or more is pretty typical. But as I said it’s really just a trend chart system. The problems I’ve found with historians are:
1. Lots of “noise”. It logs tons of data but actual information is sparse. You can’t for instance query say the highest reading in the past year. You have to query ALL readings in the past year then search that data whether in SQL or something else. Most of the data thus is useless.
2. Poorly thought out data. Usually what gets logged are the obvious key process variables and little else. So if you wanted to say query the state of a valve over the past month it’s easy to do IF someone logged the valve data. So it’s usually what you need isn’t there unless someone “lugs everything”.
3. Historians promote the “log everything” approach because it’s the one thing they do well. But often it’s useless. For instance say you want to count machine cycles. That’s easy IF there is a cycle counter that gets logged. But if nobody did this you can’t count cycles. Much harder is say looking at settings for each batch looking for trends between say QC results and batch settings. This is trivial
In general purpose databases bug impossible in a historian.
4. So my experience has been that if you spend a little time thinking up front about the kinds of things the HMI should be logging like batch records, process data including time and other results, operator electronic logs, then you will greatly increase the usefulness of the system, log far less data (noise), and do things a historian simply can’t do. I mean it’s so hung up on trend charting it can’t even give you say downtime data. This applies to not only Wonderware but even to PI which is the best one out there.


----------



## mburtis (Sep 1, 2018)

Thank you for the lead to W3schools. Going through it now. I watched and followed along with a 8 video series on you tube from realpars that helped a lot to. The scary part is that it doesn't seem that hard to interact with. Setting up the databases seem like the hard part, especially for industrial type data. At some point I'll have to figure out how to get data from the plc into it. Then find a reporting package that can extract it and do something pretty with it. So much to learn.


----------



## paulengr (Oct 8, 2017)

mburtis said:


> Thank you for the lead to W3schools. Going through it now. I watched and followed along with a 8 video series on you tube from realpars that helped a lot to. The scary part is that it doesn't seem that hard to interact with. Setting up the databases seem like the hard part, especially for industrial type data. At some point I'll have to figure out how to get data from the plc into it. Then find a reporting package that can extract it and do something pretty with it. So much to learn.


There are a couple PLCs now that can do some things natively but for the most part no PLC will directly do anything with a SQL database. I’ll give you examples how I do it. All of them involve an HMI on a server. This is preferably a server that is used for something else because if it just passively collects data and it crashes it can lose important data.

All HMIs have some sort of way to trigger an event and/or lid data to a database such as SQL or worst case run an external program. This is how you load data. So with time it’s easy. Say I want a daily report of total effluent discharge from a sewage plant. I just set up the PLC with a running totalizer. Use the largest integer you can so ir never rolls over. In the SQL data you can query the running total daily. Then just subtract to get the total for that day. You can do that externally in say Excel or in SQL using a correlated sub query. You can also save a daily number in the PLC but I don’t recommend this and here is why. Say the program gets overwritten/downloaded or the HMI goes down. You can easily spot this and worst case you still know the total even with data loss. So it may give you a two day total where if you don’t do running totals the data is gone forever.

A little more robust is to write the running total to “yesterday” in the PLC at midnight and have the HMI grab the data at say 2:30 AM (avoid daylight savings). Then if someone finds it failed to collect data you can manually trigger it in the morning with no data loss.

Ok so what about downtime? The naive approach is to log say the “running/stopped” but once a second off the machine. Then it’s quite simple to produce a report showing number of seconds or minutes (divide by 60) down by asking SQL to count seconds when the data is “stopped”. But say you have 30 minutes of downtime. Is this one 30 minute incident or a bad production day with 30 one minute hiccups over the whole day? You can find it manually and there are region growing algorithms to find this but it’s not very fast and hard to work with this. What we need is one entry per downtime event. This is a segway into my next method.

Say it’s a machine that runs a cycle. Put a cycle counter in the PLC. Also copy any and all data about that cycle into “last cycle” data. Make sure the LAST thing you do is update the cycle counter. This is the trigger for the HMI to log the cycle counter, the data, and usually time stamps are automatic. A quick correlated sub query by time stamp differences filtering out all cycles less than an acceptable cycle time gives you a list of machine downtime’s. If you then augment it with say a web form operators can annotate it with the reasons. With drop down boxes for reason codes you can Pareto chart it. This removes all guess work and fudging data: on a continuous machine you can do the same thing every time the machine starts and stops when the start/stop bit changes state instead of triggering based on time.

You can combine these too. So if you say log oven temperature and a batch code or counter a simple trend chart query can display the oven temperatures over a batch run.

If you have operator or automatic logs of some kind of serial number or batch number since other data is often in a SQL database you can do wild things like link the QC and PLC databases to help solve problems with quality results. Or using even simple batching ideas you can log when a valve is told to open or close and when the feedback occurs, storing a timer total and a cycle counter from the PLC. You can trend this data to detect when valves are getting plugged up or wearing out.

So like the last post…when you are thinking in this way the PLC becomes a treasure trove of information about your system. You just have to set up some very simple instrumented code in the PLC, and add a table, logging entry, and event in the HMI.


----------



## just the cowboy (Sep 4, 2013)

@paulengr is spot on. 
We store daily, yesterday and month so it can be repulled if needed.
Some other tips are.
Only pull the data you need or it can get cumbersome.
Have a reason to pull that date point.
*Don't overstretch yourself, reports and database can be time consuming!!!!*
We only build the tag database and pass it along, someone else makes, runs and manipulates the reports.


----------



## mburtis (Sep 1, 2018)

@just the cowboy , can you share any details on how your SQL database is structured? Since we are in the same industry I'm always interested in how you guys do it. I did a little poking around in our existing system and it didn't seem to make much logical sense with lots and lots of seperate databases etc.


----------



## just the cowboy (Sep 4, 2013)

mburtis said:


> @just the cowboy , can you share any details on how your SQL database is structured? Since we are in the same industry I'm always interested in how you guys do it. I did a little poking around in our existing system and it didn't seem to make much logical sense with lots and lots of seperate databases etc.


Our system is older, for now.
Our HMI "Iconics" will generate a Tag that goes thru kepware into the OPC server. 
We pass that OPC server into our Datadiode OPC internal Server blue side (safe side)
The data diode will copy the OPC server to it's own OPC server on the RED side ( open side)
We use a toolbox program to get that into a SQL Database.
"Reportworks" then pulls that data as needed into a report.
A office worker will then pull that report into excel with an add in and manipulate it as needed and publish that report.
The manipulation could be done in Report works but it is old and clunky, beside Excel is easier and better understood by most.

New way will be easier.
Factory talk to PI
Pi report thru datadiode.
Done.


----------



## mburtis (Sep 1, 2018)

@paulengr wanted to thank you for your input on this topic. I may not exactly understand everything when you post but it always leads me down avenues to research and learn. 

@just the cowboy thanks for sharing your system. Also look forward to your input since your also in the water industry. 

I have a lot of work ahead of me to begin improving our data collection and analysis, but like any elephant one bite at a time. 
Our existing reports and data visualization is pretty crude. The only data you can access is that which is in a pre-configured report from factory talk and this just prints the data to an Excel sheet. There are the trends in factory talk but again unless it's already set up it's a major project to modify or add anything.

I know we log a bunch of stuff into SQL but nobody knows how to even open it except for the pre-configured stuff.


----------



## SWDweller (Dec 9, 2020)

The biggest problem with databases is what you expect from them.
I had a energy data base done by building and by month then year >40 buildings.
I did that because I knew management would want to compare this year to last or extrapolate into the future. Management was all tied up with heat degree days. 
This made the database huge, in fact I had space on the main frames to keep the data safe and to keep it searchable.
PC's have gotten a lot faster and PC hard drives are a lot faster now days. 
I found that collecting data and searching data with the same PC did not work well. 

Paul had a great post which I agree with. 

It boils down to what you collect and what report do you need/want.
Example
Collecting KW demand faster than your utility does is a waste of performance and space. 
Drawing a current chart with 5 minute periods makes a smoother chart but make the data base grow faster than it can be managed at times. 

Just because you learn databases does not mean your not an electrician.
I needed a printer fixed at a mine I worked at. The IT department was busy, that day......
I demanded a temp admin log on from their office and I would fix it myself.,
The boss laughed gave me the log in and in 10 minutes all of the printers in the electric shop were fixed for all of us to use. When IT realized what I had done they offered me a job.
I declined as it was 20 grand a year less. 
Always good to keep learning


----------



## mburtis (Sep 1, 2018)

As you guys have brought up, optimizing what data to collect and how often to collect it is going to be a very important part. I know that a lot of our stuff was setup without much thought to this. We have a lot of log everything as fast as possible going on, creating a lot of pointless data. What I would really like to figure out is elegant ways to log/trend data at relatively slow rates as long as the value was between an upper and a lower setpoint and then have the sampling rate change if the value goes outside the limits. That way you could capture highly detailed information during spikes and upsets but not waste data storage during normal operation.


----------



## just the cowboy (Sep 4, 2013)

mburtis said:


> As you guys have brought up, optimizing what data to collect and how often to collect it is going to be a very important part. I know that a lot of our stuff was setup without much thought to this. We have a lot of log everything as fast as possible going on, creating a lot of pointless data. What I would really like to figure out is elegant ways to log/trend data at relatively slow rates as long as the value was between an upper and a lower setpoint and then have the sampling rate change if the value goes outside the limits. That way you could capture highly detailed information during spikes and upsets but not waste data storage during normal operation.


That is log by exception type data.


----------



## SWDweller (Dec 9, 2020)

I did a job where we tied 5 hospitals together energy data logging wise. 
I got to hospital on a military reservation in southern Cali and the Decider wanted 100 points every 5 minutes on each meter. I politely told him that this was a bad idea and none of the other hospitals I had done were like this. He gave me the more data the better graph rational. So I wrote everything down and proceeded to set up the monitoring and collecting per instructions. Left for the evening, had a great breakfast drove the 40 miles to the site. Produced my id and was met by 2 young gentlemen with helmets and shoes that you could see your reflection.
The general want to talk to you! OK I said I knew the way and was instructed to drive between the two jeeps. Fine. Got to the generals office, he was livid. He calmed a bit and started asking questions. I know you have done two of these and there were no problems, that is why YOU were chosen to be HERE. I explained about he guy wanting more data, provided the name and phone number. I then asked what is the problem. The general asked me, did you know you were going through a fire wall with the data, yes, all of the other ones did. Except this time you crashed the fire wall. Did you know where the fire wall was located, some where in Idaho. 

Well it is a nuclear weapons depository and the little red light turned on the big board at SAC command, Thunder Mountain. I was then asked to set this monitoring up like all of the other ones I had done and not take any instructions from any one over the phone. 
The general turned out to be a lot more than just a Dr in charge of a hospital. Nice guy, I was invited to breakfast in the mess with the general for the rest of my stay. Never really got over the shiny helmets and shoes. Out in the desert had to be a bitch to keep them like that. 

I will not say that it is easy to choose what you want to keep track of. A lot has to do with the process of the place your working at. In the long run I found that electrical events, voltage and current were better handled by separate devices which stored that information. And could be moved to a graphing machine when needed. The single most important value that I found was Peak Demand. Every one was concerned about that, some places actually were a bit crazy and would get up to 1000 kw of the magic number. I built a screen that had the red line and the graph below would fill in green, yellow, red and then flashing red. One place if you broke the bubble. It was $40k a meg per day for the rest of the month. Trick to the red line, have the screen say what it is but have the value in the back ground 500 kw less. Your equipment may or may not be as quick as the utility. Best to catch the event before and give operators a chance to make some changes.
I always enjoyed doing these jobs. I had to have a graphics guy do the screens for me as I am a one line kind of guy. I always copied the screens on to another media so I could make visual changes off line.
Some of the systems I started up had run time hours on the starters, as well as trip notification. I found I could use the run time hour logging as preventive maintenance on pumps and fans. Trips on the starters was not as helpful as Trips on the breakers, some of the breakers would give you LSIG in separate registers. That was help full.


----------



## paulengr (Oct 8, 2017)

I didn’t get into it but in larger/secure systems you need to have separation of networks. The easiest way is to have 3 layers. The PLCs sit at the lowest layer. In fact often it’s nice to have another layer where each PLC has its own IO on a separate physical network. If you need DNS or login authentication such as for HMIs it’s at this layer. There is a firewall between them and the next layer (DMZ). The firewall is configured so that ONLY PLC ports and protocols are allowed. The next layer up are the database servers typically. They collect data from the PLC network and support SQL queries. Also programming PCs for PLC maintenance live in this layer. Then there is another completely separate firewall. This one allows ONLY PC protocols, particularly remote access for server maintenance and SQL calls. The next layer is the office LAN. This you cannot access the PLC network (factory floor) directly. Everything must go through one of the servers in the DMZ. All communications between the office and plant floor must go through authentication and at least a protocol change. Another variation only has an OPC server in between and if you have a single physical server you can theoretically set up the firewall stuff in software on the same machine but penetrating this one server leaves the entire network open. Having separate firewalls especially if say the electrical department controls one and IT controls the other severely reduces the security risk.

But that’s all networking and security details.

The trick with setting up data collection is you need to start with what information you need or think you will need. This starts with process knowledge. Knowing that you structure your data collection in that way.

Another database to consider is the alarm log. If properly configured this is also a huge source of troubleshooting data. But again there is a challenge with it. Do NOT just log alarms. Also log every event. Depending on the operators you may even log every button press. Log every auxiliary contact (starting/stopping). Just set these extra “alarms” up as events but configure the regular alarm view so you don’t see them. Make them auto reset. That way in the alarm/event system you don’t see them but you can view them in the history. So you can know that operator flipped pump to manual control, pressed Auto start. Then the tank overflowed. Then operator pressed E-Stop, then realized the mistake and flipped pump back to auto and attempted to blame the level sensor. These event logs are tremendously useful for establishing a sequence of events. Again just logging time series data isn’t so great especially since a button press only lasts milliseconds.


----------



## mburtis (Sep 1, 2018)

I appreciate your insight into the security and networking side. Our system security is pretty basic. We use to have our own standalone firewall and internet. Then we had some issues and the corporate IT guy got involved and now we are running through his firewall at City Hall. He likes to look the hero but then turns into a prick when you call him because it's not working. One of the reasons I'm dabbling in this stuff is so I have a small chance to keep the place running when his stuff screws off or at least be able to understand the computer garbage lingo he spews. We don't have the SCADA system linked with the office computers or nothing literally everything resides in one computer (well and a virtual machine). Remote access laptop is the only reason it's even connected to the outside world.


----------



## mburtis (Sep 1, 2018)

Being water treatment knowing some of the data to collect is easy. We have several monthly reports that are required by EPA for regulatory compliance and some data that we are required to save years. Those are obviously the places to start with optimizing and improving, everything else is operational data. Luckily I started as an operator and actually still am an operator with a level 4 treatment license (highest in wyoming) so I understand the plant process and what sort of data is important. The discussion about saving data from batch type processes really has me thinking about our filter cycles. We are a continuous process but our filters could be considered batch type since we have to backwash them every 48 to 72 hours. There may be opertunities to greatly improve the information from our filter reports instead of just printing out 4 pages of timestamped turbidity reading.


----------



## SWDweller (Dec 9, 2020)

I interviewed for a job in a Aluminum treating facility. I noticed that their factory processes and the front office shared the same network. When asked about communications in the interview I said that they need to remove the network HUB and replace it with a Bridge. Separating the two systems. Factory process with PLC's can get real chatty on the network. The extra load of office people will push the system over the edge crashing the network equipment. Which happened during my interview. The decision maker of the facility dismissed me and never heard from them again. I heard through the grape vine that he was replaced by someone from the home office. 
It was an awfully hot environment glad I did not get the position. Lots of ruined production that day.

Water guys were always my favorite folk. They were professional about their jobs and plant.
Helped in my opinion most of them are licensed by the state. 

A though for you consider PLC's or smart relays which will continue to run if there is a network issue for the critical sub systems. You will loose data, but the processes continue.
I have seen what happens when the network causes the primary pumping to shut down.
It is certainly not pretty and it took me several minutes to figure out what had happened.
Explaining it to the customer was not easy nor were they satisfied with the answers they got in the beginning.


----------



## mburtis (Sep 1, 2018)

That is the nice thing is that for the most part, everything can crash and the plcs and plant just hum along doing their thing. One thing about water though, if you have an upset it's not lost production in a sense of money... it's public safety.


----------



## paulengr (Oct 8, 2017)

Water plants typically have individual PLCs at each station or process, plus some kind of semi-manual backup system, plus various emergency pumps and such. Triple backups is almost a minimum. Wastewater especially is all gravity through the collection system. No such thing as a “shutdown”.

EPA and state do allow some loss of data (maintenance time) but if you need to there is another way around data loss but it’s time consuming to set up. Create a “circular buffer” in the PLC. So in this case all data is written into records in an array (table). There are two pointers into the array. Every time the PLC writes new data it advances the “head” pointer and the queue grows. Every time the second “tail” pointer is not equal to the “head” pointer the HMI advances the tail pointer and reads the data. When either one reaches the end of the table it “wraps around” (reset to zero). This allows potentially dozens to hundreds of data points to buffer in the PLC allowing hours to days of communication loss with no data loss. It works extremely well but there is a lot of programming involved to get it all working correctly and it’s really only needed if you can’t lose data under any circumstance.

Also running two identical SQL servers really helps with data loss along with regular old backups. So as an example build up two computers. Put one in the control room and the other in the lab. Keep the HMI running on both.

Also consider fanless PCs with solid state drives. This is cheaper than you may think. These aren’t very powerful computers but they don’t need to be. But with zero moving parts they usually outlast traditional servers and PCs.

Factorytalk View does everything you need but you need the standalone or networked SE version, not ME. Just configure one log per event and use events or the built-in timing to trigger entries into the SQL database. Then to view data configure SQL queries in Excel in tabs then nice formatted reports on other tabs.


----------



## mburtis (Sep 1, 2018)

Products & Technologies for Industrial Production | Softing







data-intelligence.softing.com





This is something that the contractor who normally works on our plcs ran across and sent to me. No idea how powerful ect it is but man they make it look easy to pull some data. No idea how expensive it is either, seeing as it's made to work with Allen Bradley I'm sure they are proud of it.


----------



## paulengr (Oct 8, 2017)

mburtis said:


> Products & Technologies for Industrial Production | Softing
> 
> 
> 
> ...


Something similar from GE Fanuc is $40,000. AB sells RS Metrics which is similar.

The big key with these type of products is that you are buying a bunch of prebuilt templates. The configuration you do is mapping the PLC tags to the templates and keying in some information such as how many product lines you have. However what they don’t tell you is that all of these products have s particular plant model. If your plant doesn’t match the model, it doesn’t work. And you will spend a lot of time adding monitoring code to the PLC to get things into the order expected by this software. And if you want to do some kind of analysis not supported by it, it doesn’t work.

If you go this route you need to read over every bit of documentation carefully to fully understand what tags it is looking for and how everything is structured. It is rare that these template packages actually work as advertised and often they don’t work at all. By the time you work around them you may as well have done the work yourself without the templates. For instance I developed my downtime data collection by reading the RS Metrics manual and realizing by time I did what it needed to work I could do my own. And it only worked for production lines making things like automotive. The moment you had a continuous process or couldn’t track product flow it broke down. So I got a reports package and did the test in house.


----------



## mburtis (Sep 1, 2018)

I've looked at it very briefly. Mentioned it mostly in case it would help someone else out. My thoughts on it were similar to yours; in that it probably works great for a few certain instances but becomes a giant pita as soon as you get out side of those examples. Plus then your locked into a bunch of proprietary software ect. If I could bring myself to accept that I'm a nerd I would try to learn Linux so I wasn't forced to use garbage windows but I'm not quite there yet.


----------



## paulengr (Oct 8, 2017)

mburtis said:


> I've looked at it very briefly. Mentioned it mostly in case it would help someone else out. My thoughts on it were similar to yours; in that it probably works great for a few certain instances but becomes a giant pita as soon as you get out side of those examples. Plus then your locked into a bunch of proprietary software ect. If I could bring myself to accept that I'm a nerd I would try to learn Linux so I wasn't forced to use garbage windows but I'm not quite there yet.


Funny. I worked with the predecessor to Linux (Minix) and lots of BSD systems such as Sun (source of Java). Linux at first was a big improvement but I reverted to Windows once 98 came along. I mean it just got frustrating having to recompile things all the time and all the compatibility problems. Windows was a dog and a virus sucker but I had fewer problems with it.

Moving forward then I bought a brand new laptop that came with Windows Vista. All I can say is I bought one with screaming fast hardware for the time but it was an utter dog. Slow as molasses. I tried hardware checking, everything. Finally on advice of an IT friend and knowing Linux is great at solving hardware problems I downloaded it onto USB, rebooted and…wow! That was the screaming fast PC I bought. Ok but then there is the compatibility thing. So when I opened a file or went to a web site it popped up a message that was something like….hey I see you are trying to do X. I don’t have the software loaded for that but I can install it for you, OK? Wow! Windows never did that. Keep in mind this was before packages and “app stores”. Everything just worked every time. I could even run a lot of Windows stuff. I went to permanently install dual booted and it corrupted the Visra Virus partition. I was kind of mad about that but never looked back!

Today I run it as the base OS. I run Windows legacy software like PLC software inside VMs. It actually runs faster than Windows by itself. And if Windows crashes I can just reboot it in seconds…much faster. And I have few compatibility problems because each Windows program is in a separate isolated VM. And I still have my nice fast laptop with no software compatibility, able to run Unix and say Wireshark natively. It’s a dream setup.


----------



## mburtis (Sep 1, 2018)

@paulengr curious as to what Linux distros you use? I broke down yesterday and loaded Ubuntu onto an old laptop we had around the plant just to start playing with it. Thought I might play with some of the other ones (Debian,mint)in virtual machines to try them out. The reason I’m interested at all in Linux is just for the stability aspect of it, well and I like the philosophy of open source software. Looking at it from an industrial side as far as running a database, etc. I’m not exactly super knowledgeable about computers but even I know that most industrial softwares like to completely crash every time windows updates. We have one SCADA computer that blue screen of death every time you restart it, and has from day one. Seems like Linux can offer a more robust system if a person wants to learn just a little bit.


----------



## splatz (May 23, 2015)

If you want to play with linux distros, I highly recommend making some boot CDs / boot USBs. Far less commitment. The distro that you want for your desktop may not be the one you want for your server. The desktop environment etc. is not really important for the server. Redhat for example is popular for servers but not so much for desktops. 

A few years ago I was really impressed with some of the Linux distros efficiency. You could run some of the leaner Linux on hardware that was marginal for WindowsXP - XP would run but painfully slow, say a 512MB RAM system - and Linux would perform well. Some had such a small footprint you could boot to run from RAM and those old clunkers would run really fast. But the Linuxes have mostly bloated along, just a generation of bloat behind Windows. 

But Windows bloat has somewhat peaked, I don't think there was all that much from XP to 7, and I think 10 is actually leaner than 7. It's now easy to throw enough hardware (RAM and solid state drive for the OS) at Windows to make it perform well. In other words slightly better efficiency is not a big deal any more. 

I have been using Unix for a very long time, and followed Linux from the beginning, and I really don't find it superior to Windows 10. No lie. With Windows 10 I think Microsoft has finally pulled ahead of MacOS. MacOS IMO has always been a better Unix than Linux. 

I usually keep quiet because Linux devotees are basically religious zealots, it would be easier to talk them out of their gender than talk them into Microsoft. But as long as I am cocking off ...

Linux and mySQL are IMO popular because they are free, and good enough for a lot of things. They are popular with kids, students, academics, etc. Also web developers, developing for small sites selling salsa and whatnot. Kind of like OpenOffice / LibreOffice. For most people, anything will work, they don't need much. If your database is for your online salsa store, run Linux and mySQL. Once your ass is on the line, you will develop an appreciation for Microsoft and care a lot less about a couple thousand dollars software cost. There is a reason that far more businesses trust Microsoft for critical systems, and it isn't because corporate IT guys are dumbasses. 

Now oddly Linux has a place again at the outer limits, highly specialized applications; IBM does some crazy stuff with Redhat, things that are beyond the reach of everyday corporate sorting, searching, recording and reporting transactions. I hear about it and read about it but it isn't really at my end of the pool. 

So for the bread and butter stuff, between the salsa store and the space program, if I have to choose between Windows Server and Microsft SQL Server, and Linux and mySQL, I won't have to think long, it's going to be Microsoft.


----------



## paulengr (Oct 8, 2017)

mburtis said:


> @paulengr curious as to what Linux distros you use? I broke down yesterday and loaded Ubuntu onto an old laptop we had around the plant just to start playing with it. Thought I might play with some of the other ones (Debian,mint)in virtual machines to try them out. The reason I’m interested at all in Linux is just for the stability aspect of it, well and I like the philosophy of open source software. Looking at it from an industrial side as far as running a database, etc. I’m not exactly super knowledgeable about computers but even I know that most industrial softwares like to completely crash every time windows updates. We have one SCADA computer that blue screen of death every time you restart it, and has from day one. Seems like Linux can offer a more robust system if a person wants to learn just a little bit.


Mint is basically Ubuntu redlined to look like Windows. Don’t expect any huge changes because it’s still Gnome as the Window Manager (WM). Within the Ubuntu skins take a look at KDE Plasma. It’s interesting but the apps it comes with as replacements for the standard ones suck and it’s not very stable quite often which is why I dropped it.

Going in another direction and getting out of the very limited Debian world look at Garuda especially the base and Wayfire editions, or maybe Manjaro. These are Arch based distress so no Debian on site that also don’t suffer from the heavy hand of Debian and Canonical and have more of a server based background like RHEL or Centos.

As far as me personally here’s the thing. I am doing mostly electrical or system integrator type projects. I need to deal with sometimes unruly software, sometimes even stuff that only works on say 98 or 2000. I want absolute stability and reliability. The last thing I want is a customers machine is down and Windows sits there locked up for 40 minutes loading some system update that then makes it not reboot able. This is not fantasy…it has happened. Along with multiple incidents of deathly slow restarts on laptops. So I’m not a fan of KDE for instance…too many crashes and about as stable as Windows Vista or the 64 but Windows 7 on a VM.

When it comes to Distro/OS selection for servers RHEL as an example is the host OS of VMWare ESX servers for a reason. CentOS used to be a good choice too but I’d lean more towards Manjaro (Arch) these days. In terms of LAMP MySQL has gone off the rails. Postgres cleaned up their act and the new fork of MySQL is MariaDB. If you need a good embedded SQL on any OS SQLite is hard to beat. I’m not a big PHP fan but at the application level often that doesn’t matter. The argument isn’t about free especially considering RHEL and VMWare but reliability and compatibility.

In my mind Microsoft has had 4 killer products: Windows MS Basic, XP/NT Server, Office, and SQL Server. In the early days MS Basic was probably the most pirated and cloned program ever. Prior to the mid 80s there were very, very few canned software packages. Basic made the thing widely accessible to a wide audience. XP finally ushered in a “modern” OS that did everything the competitors did such as true multitasking in separate memory spaces. The reason though you wanted XP was die running Office. Although Excel was a poor clone of Lorus 123, Word of WordPerfect, etc., the tightly integrated bundle with VBA made a very formidable package. To the point where despite years of trying to create another “Office” MS dominated foe a decade. The problem though came from Windows at the very beginning. Everything hinged on COM/DCOM which was designed when every Windows PC was an island. Prior to that sockets libraries and software pipes existed but Mac copy/paste was the gold standard. They needed to pass nor just text but objects. This seemingly simple thing is the core of Office. Take it away and the app is fundamentally weakened. You can’t have “Office” without COM and VBA. But it wasn’t built with multiple users or security in mind. So it has become this huge problem. Java was developed in response to COM and recognizing the threat ir in turn creates MS developed .NET. Office was the last major package to walk away from COM ( and Wonderware). But the result, Office 365 pales by comparison. All the macros, apps, embedded goodness, and extensions were erased. What’s left is nothing compared to the competitors, to say nothing of the ribbon thing that makes it hard to do the most basic tasks.

That leaves SQL Server. MS has finally realized that they aren’t competing with Oracle on price (as in who has the biggest gouging) nor on performance/features (roughly equal). In the desktop market MS has a strong foothold but in servers they are competing against the modern version of LAMP where it is hard to charge the customer a higher price tag than the hardware itself with a “me too” product, especially with their legendary fails when it comes to software updates. Hence the reason MS is by far not dominant in the server market.

The big thing you have to contend with specifically with SQL databases though is that ANSI SQL only gets you so far. It is not a universal game. For instance Postgres has a ton of specialized support for GIS (map) stuff nobody else has. MS has hooks into Office apps. MariaDB has a bunch of Java specific features. Postgres does better as a distributed DB. And then we get into Map/Reduce which is an entirely different Non-SQL way of doing things such as CouchDB which has its advantages in it’s application domain (document searching, free form databases) and data historians that are specific to trend data and only have a surface resemblance to ANSI SQL. Even MS SQL has a huge pile of nonstandard SQL features. As an end user to some degree any decent SQL database will do some jobs well but when you peel back the covers and start looking at the wildly different feature sets it usually becomes obvious which one to use. With MS SQL you have one and only one OS choice except if it’s in a VM. But with all the others you have multiple choices.

One reason to consider a VM regardless of the host and guest OS is this. Ever try to backup a server? How about a running one? You can and Acronis is very good. With a VM I can just click “snapshot” and just like that in seconds I have a saved copy that I can backup. That’s over and above automatic backup systems that do the same thing. Although many SQL systems have excellent rollback/recovery systems what about doing an OS upgrade that fails (Hello MS)? With a VM just clone the VM, attempt an upgrade on the clone, and if it fails delete it. If it succeeds make this the primary. Easy Peasy. You stop thinking of “servers” as physical hardware and more like software modules. The host OS “doesn’t matter” but the best choices are VMWare ESXi (RHEL), Citrix/Xenserver (the original hypervisor), RHV, and Virtualbox. All have been out there a long time and are very stable and reliable. You can make MS Server do it too but it’s always sort of suspect because it’s all closed source snd known to have all kinds of strange back door stuff. As far as performance we’ve clocked it. The overhead is something like less than 1% of all resources (disk, CPU, RAM) despite all the stuff it does for you and your sanity. Actually it’s better. With multiple VMs, you will notice performance improvements because of disk deduping and that you can better utilize idle clock cycles so that with 2-3 “big” servers instead of 5-10 “little” ones the big high end servers come into their own.

Anyway not saying MS doesn’t have its place but if you are wanting to run high performance or high reliability it’s a big problem. Rackspace, Google, Amazon, and all the embedded companies (phones, tablets, cars) all use Linux for a reason. If you are running say Office and only Office or Factorytalk or Wonderware you are stuck with having MS Windows at least in a VM for the servers. But outside that limited environment you can run almost anything.


----------



## mburtis (Sep 1, 2018)

I appreciate the feedback. I'm constantly amazed at the volume of stuff that members on this site are knowledgable about. I'll have to check out a couple of the arch based distros. Linux has always intrigued me so it's just playing around with them right now. Learning SQL and virtual machines relates directly to being able to maintain our system and that's the path I'm on at this point. Plus there are some really powerful reporting and visualization tools out there. In today's world there is really no sense in not utilizing some of these tools, even if it is just to impress the suits. I'll always be a relay man at heart but if I have to learn how to herd imaginary pixies as well I think I can to a limited degree. It is interesting how the massive advances in technology and hardware capabilities has changed some of the dynamics. Hardware is so overcapable now you can just throw crap at it and it will still work where it use to be you had to optimize every little detail. I still got a long time left in my career so it will be interesting to see where it goes from here.


----------



## splatz (May 23, 2015)

paulengr said:


> Anyway not saying MS doesn’t have its place but if you are wanting to run high performance or high reliability it’s a big problem. Rackspace, Google, Amazon, and all the embedded companies (phones, tablets, cars) all use Linux for a reason. If you are running say Office and only Office or Factorytalk or Wonderware you are stuck with having MS Windows at least in a VM for the servers. But outside that limited environment you can run almost anything.


At the tiny end, Linux wins easily because it can be stripped to essentials and made to run on extremely limited hardware (busybox), and very often with these embedded devices the cost of the windows license would be prohibitive. 

Millions of web servers run linux and mysql because it's free. If a web developer can put up an ecommerce site to sell your salsa for $5000 using linux and mysql or $7500 using microsoft, they're going for the cheap. There are zillions of these servers out there for people selling salsa out of their basement etc. so the numbers are big but they are usually not doing custom SQL development and to be honest most are pretty mickey mouse and don't demand the reliability that a utility company or full scale business needs. 

Rackspace, Google, Amazon, IBM, etc. are doing highly specialized things on a huge scale and have the staff and resources to take advantage of the flexibility that Linux affords them and also benefit from the performance edge. But comparing these to more bread and butter business / industrial / institutional applications is apples to oranges. 

My point here is that the Microsoft Windows Server and SQL Server platform is the best fit for the utility company's situation. Hands down.


----------



## just the cowboy (Sep 4, 2013)

Remember:
Don't roll your own, you are not always going to be there. You are not doing anyone a favor by getting creative. If it is something you need make them spend the money for a supported system. 
Watch what you use, FT tags for historian are real expensive.


----------



## splatz (May 23, 2015)

just the cowboy said:


> Remember:
> *Don't roll your own, you are not always going to be there. You are not doing anyone a favor by getting creative.* If it is something you need make them spend the money for a supported system.
> Watch what you use, FT tags for historian are real expensive.


*AMEN to that!*


----------

