Quantcast
Channel: THWACK: Discussion List - All Communities
Viewing all 21219 articles
Browse latest View live

"Check mismatched MS updates" in every Server

$
0
0

After deploying 7 fresh servers, Windows 2016 with the latest updates installed on them. I ran the Deployment Health and I'm getting the followign error message for every server:

 

Cannot get .NET version for the following servers:

Cannot get OS version for the following servers:

Cannot get System.Runtime.Serialization.dll or mscorlib.dll versions for the following servers:

 

Any idea how to resolve this?


The Ultimate CPU Alert ... for Linux!

$
0
0

Back in Oct 2014 adatole made a beautiful CPU (The Ultimate CPU Alert) alert using SAM to capture the Windows processor queue length.  Using some SQL goodness he showed us how to count up the number of CPU and compare the CPU queue length so that only truly constrained servers were alerting.

 

Enter Linux. (insert ominous sound music)

 

I have the good fortune of following along in Leon's shoes in my current role so I've inherited a bunch of his innovation to play with as a foundation.  Here's what you are going to need to play along:

 

1.  NPM - We'll be building a Universal Device Poller and a transform

2.  A Linux server or two to test against (but you wouldn't be doing this if you didn't have Linux servers, right?)

3.  An understanding of Load Average and why it is important.  (Read this if you are wondering why you need to monitor for load average)

 

Remember, load average alone isn't going to be enough to alert with clarity just as CPU load isn't enough to tell you your server isn't correctly sized for the current load.  We're going to bundle the two of them up to help give some intelligence to this whole little sordid affair.

 

The Universal Device Poller

 

  1. Log into your NPM primary polling engine (or, if you only have 1 NPM server, the *only* polling engine) and open up Orion Universal Device Poller. (Start > Programs > Solarwinds Orion > Universal Device Poller if you are using Server 2008 or earlier).
  2. Click on New Universal Device Poller
  3. Configure your UnDP as per the screenshot below.  Note the name of loadAverage15MinInt -- this is an integer.  We'll be transforming this result starting in step 4.  The OID is 1.3.6.1.4.1.2021.10.1.5.3 for those of you who want to cut and paste.  Click through the remaining screens as per normal (testing, assignment, etc.)

2014-12-17_15h20_40.png

 

     4.  Click Transform Result.

     5.  Name your transform loadAverage15Min (unless you want to name it something else, but just remember you'll need to change the name in the alert later in the process!)

     6.  Configure the transform as is shown in the screenshots below.  You can name the group whatever you want -- but it is a good idea to group the UnDP and transform in the same group, at least in my mind.

2014-12-17_15h28_51.png     2014-12-17_15h30_00.png

 

 

     You have now created a Universal Device Poller that will query an SNMP-enabled Linux server for the 15 minute load average.  If you want the 1 Minute Load Average (OID: 1.3.6.1.4.1.2021.10.1.5.1) or the 5 Minute Load Average (OID: 1.3.6.1.4.1.2021.10.1.5.2).

 

The Alert

 

You've built yourself some Universal Device Pollers (15 minute and, if you are a keener, 1 minute and 5 minute) and the associated transforms.  Now you are going to build an alert.  This is going to be a custom SQL alert so remember that leaving the reset condition as a "when no longer true" isn't going to fly. You're going to need to build a reset trigger in SQL as well.  The query is a little complex (and I did steal the hardest part from Leon's Utlimate CPU Alert post), but once you understand what it is up to it will all make sense.

 

     1. Open up Advanced Alert Manager on your Orion Server.  (If you are reading this after NPM 12 has been released some time in 2015, remember the olden days of non-web accessible alerts!?!)

     2. Name your alert and set your evaluation frequency according to your standards.  (You do have standards for that sort of thing, right?)

     3. On the Trigger Condition tab select Custom SQL Query in the 'Type of Property to Monitor' drop down and then Custom Node Poller in the 'Set up your Trigger Query' drop down.

2014-12-17_15h44_43.png

 

     4. Selecting these values will pre-populate SELECT statements in the gray box.  This is a good thing as there are a whack of table joins in there to make all of this goodness work.  Copy and paste the code below in the white section of the trigger condition tab.

 

INNER join (select c1.NodeID, COUNT(c1.CPUIndex) as CPUCount

from (select DISTINCT CPUMultiLoad.NodeID, CPUMultiLoad.CPUIndex

from CPUMultiLoad) c1

group by c1.NodeID) c2 on Nodes.NodeID = c2.NodeID

 

 

WHERE  

  (  

  (Nodes.n_mute = 0) AND   

  (Nodes.OwnerGroup = 'LINUX') AND   

  (Nodes.Prod_State = 'PROD') AND   

  (   

  (CustomPollers.UniqueName = 'loadAverage15Min') AND

  (CustomPollerStatus.Rate > CPUCount)

  ) AND   

  (Nodes.CPULoad >= 95)

  )

 

     5. Set the 'Do not trigger until this condition exist for more than' to 15 minutes or so other value that is tied to your statistic collection interval.  In our case, we poll our nodes every 5 minutes.  This means that we have to experience this condition for at least 2 and up to 3 polling intervals for it to trigger.

     6. Click the Reset Condition (don't worry, I'll explain the query logic below!) and click the 'Reset this alert when the following conditions are met' radio button.

     7. Paste the following query text into the white space and set the 'Do not reset until this condition exist for more than' to 0 seconds (or a value that makes you happy)

 

INNER join (select c1.NodeID, COUNT(c1.CPUIndex) as CPUCount

from (select DISTINCT CPUMultiLoad.NodeID, CPUMultiLoad.CPUIndex

from CPUMultiLoad) c1

group by c1.NodeID) c2 on Nodes.NodeID = c2.NodeID

 

 

WHERE

(

  (Nodes.n_mute = 0) AND

  (Nodes.OwnerGroup = 'LINUX') AND

  (Nodes.Prod_State = 'PROD') AND

  (

   (CustomPollers.UniqueName = 'loadAverage15Min') AND

   ((CustomPollerStatus.Rate <= CPUCount) OR

   (Nodes.CPULoad < 95))

 

     8.  Finish configuring your alert as per your standards for things like Time of Day, Trigger Actions, Reset Actions (you use them both, right?) and click OK.

 

What does it all mean, you ask?

 

Let's look at the trigger query.

 

Using Leon's code for the INNER JOIN we're going to ask NPM to select and count the number of CPUs on a given node by using the CPUMultiLoad table.

 

INNER join (select c1.NodeID, COUNT(c1.CPUIndex) as CPUCount

from (select DISTINCT CPUMultiLoad.NodeID, CPUMultiLoad.CPUIndex

from CPUMultiLoad) c1

group by c1.NodeID) c2 on Nodes.NodeID = c2.NodeID

 

Armed with a CPU count we're going to do some checking to limit our scope of influence.  In our environment we're using custom properties to limit the scope of alerts (and you should too - it sure beats letting a SQL query check against every node in your environment for CPULoad when only your Linux servers will have the UnDP assigned!).

 

WHERE  

  (  

  (Nodes.n_mute = 0) AND    

  (Nodes.OwnerGroup = 'LINUX') AND    

  (Nodes.Prod_State = 'PROD') AND   

  (   

  (CustomPollers.UniqueName = 'loadAverage15Min') AND

  (CustomPollerStatus.Rate > CPUCount)

  ) AND   

  (Nodes.CPULoad >= 95)

  )

 

Next, we ensure that the stat from the UnDP is available AND the Custom Poller value is greater than CPU Count AND CPU Load is greater than 95%.  All 3 conditions have to be true.  (Remember, if you named your transform something different, this is the place to change it!)

 

WHERE  

  (  

  (Nodes.n_mute = 0) AND   

  (Nodes.OwnerGroup = 'LINUX') AND   

  (Nodes.Prod_State = 'PROD') AND   

  (   

  (CustomPollers.UniqueName = 'loadAverage15Min') AND

  (CustomPollerStatus.Rate > CPUCount)

  ) AND    

  (Nodes.CPULoad >= 95)

  )

 

The reset query looks almost exactly the same but the reset conditions are slightly different (of course!)  For the reset we want to check for the UnDP AND either the LoadAverage less than the CPUCount OR CPULoad is less than 95%.  Your tolerance for CPU load will be different than our threshold, feel free to adjust accordingly.  The key is that the reset will happen if either the CPU load is less than 95% OR the load average is less than the CPU count for all nodes that are polled for the loadAverage15Min.

 

INNER join (select c1.NodeID, COUNT(c1.CPUIndex) as CPUCount

from (select DISTINCT CPUMultiLoad.NodeID, CPUMultiLoad.CPUIndex

from CPUMultiLoad) c1

group by c1.NodeID) c2 on Nodes.NodeID = c2.NodeID

 

WHERE

(

  (Nodes.n_mute = 0) AND

  (Nodes.OwnerGroup = 'LINUX') AND

  (Nodes.Prod_State = 'PROD') AND

  (

   (CustomPollers.UniqueName = 'loadAverage15Min') AND

   ((CustomPollerStatus.Rate <= CPUCount) OR

   (Nodes.CPULoad < 95))

  )

)

 

That's it!  Empowered with a UnDP and this alert logic you are ready to ditch the out of the box CPU alerts and start giving your Linux (and with Leon's Ultimate CPU Alert, Windows) support teams really refined alerts giving them more time for important things (like Thwack Monthly contests!)

 

Feel free to comment, criticize or otherwise critique.

 

Message was edited by: Joshua Biggley - fixed grammar and added a link to the custom SQL reset condition discussion on Thwack.

IIS 10.0 Registry Tuning Settings (IIST-SV-000151 / V-76755)

$
0
0

Hello,

 

I have a STIG requirement (The IIS 8.5 web server must be tuned to handle the operational requirements of the hosted application.) to configure the following registry values:

 

HKLM\SYSTEM\CurrentControlSet\Services\HTTP\Parameters\"URIEnableCache"

HKLM\SYSTEM\CurrentControlSet\Services\HTTP\Parameters\"UriMaxUriBytes"

HKLM\SYSTEM\CurrentControlSet\Services\HTTP\Parameters\"UriScavengerPeriod"

 

The STIG does not contain any specific recommendations for these values.

 

Are there recommended values available from SolarWinds?

ADP connections to Unknown App(s)

$
0
0

Hi there,

 

We absolutely LOVE the new ADP feature but it has left us wanting more in the likes of information presented to the user.

One such bit of information that is lacking is the process PID used by the Unknown App(s). Without this, we wouldn't be able to tell which process the application is connecting to. We could of course remote into the machine and run netstat -anob | find "52734" to get the PID but it isn't that simple as you read on.

 

Here's what I mean:

 

Orion is connecting to svchost.exe on our domain controller. That svchost.exe instance is not already monitored and the connection metrics are therefore not present.

If I click on Start Monitoring, I am taken to the Component Monitor Wizard, which gives me the option to create a new monitor/template/component. At this point, we wish this was easier, especially since there's 30+ instances of svchost.exe running on the domain controller.

Which one do we choose? Even if selecting all of them, we might not pick the right ones if new instances of this process are created in the future.

 

Either way, in order for us to properly identify the process in question, we would need a PID. Is this something that could be added easily?

 

 

 

aLTeReGo thoughts on this?

How to create a separate views for groups?

$
0
0

There has to be a better answer to what I'm trying to accomplish here. Let give you a little background on my network. We are a new WISP and have ~70 wireless towers. Each tower has 4-8 base-stations, 1 switch, and 1 UPS. I put all of the equipment of each tower/pole into their own group, then put all those groups into a root group called... Towers/Poles I have many other groups defined as well. Groups of Locations, Roles and Device Types for all of our nodes.

Now, when I'm trying to create views for these groups, I expected it to function just like setting views for nodes but this is not the case. It keeps changes the view of EVERY GROUP. Why?! Why would anyone ever want this?! I have wasted so much time trying to get each pole in its own view. Essentially I want every tower/pole to have a summary view of the equipment with some stats and a topology.

https://support.solarwinds.com/SuccessCenter/s/article/How-to-create-a-separate-group-details-view

I say this article and while it did describe a way to get past this limitation, it put the "groups" in one place. And the WRONG place. It would have been somewhat acceptable if it adds another menu to the actual menu bar but it does not. One thing I have noticed about NPM is the terminology of NPM is not as it seems...

2019.4 installer getting stuck at Job engine

$
0
0

Hi All,

 

Anyone came across this while doing fresh installation? I see error related to 1603 in the log file, nothing else....

 

Synthetic Testing of Multiple Services

$
0
0

I know I've seen several mentions of synthetic testing in the past and even found a PDF for SolarWinds Synthetic End User Monitor, but I can't find an actual product that does this.

I see Quality of Experience monitoring, but nothing actually performing some of the synthetic testing.

 

What I'm getting at is, how can I ensure services such as DHCP, DNS, or websites are loading and functioning in an acceptable time-frame?

 

I would like something to perform DNS look-ups and ensure the response is both correct and received within a few milliseconds.

 

I would imagine this being an agent you could install on a Windows or Linux server, preferably something that could run on Raspberry Pi and toss in a closet at some of our remote sites.

 

Thank you!

SolarWinds Module Interaction: WPM / SAM with AppMon

$
0
0

Hello -

 

I'm still relatively new to SW so I've been doing learning by fire  

 

I have a SW server with some custom App Monitors configured. 

 

The server has SAM and WPM installed right now but we are looking at removing WPM.

 

My question is, is there a way for me to tell if a customized Application Monitor is specifically using WPM.  This is the question I need answered.

 

Also, generally, is there a way to tell what module (SAM, WPM, NCM, NTA, etc.) is being used on a server in alert, application monitors, etc.?  With all of the different modules, at times it's difficult to ascertain which module is being used for a specific purpose.

 

TIA.

 

FL


KIWI Syslog Server showing msgs from Unix and CISCO but not Windows

$
0
0

Hey guys,

 

Wondering if someone can help as ive been pulling my hair out for 2 days with this;

 

Installed the EVAL 14 day Trial version of KIWI Syslog Server (9.6.7) and put it on a Windows Server 2016 VM. Server is setup to log messages to a file and display recieved messages to the default view. UDP and TCP ports are ticked and using standard port numbers for both protocols.

 

Unix and CISCO devices are coming up in the Syslog server nicely and are being displayed in console.

 

Windows is a no go - will not display messages in the console.

 

Installed Windows Log Forwarder on Win 10 and Sever 2012 machines - Set server IP and UDP port number which matches Syslog Server. Set a subscription up to look for application error event with an ID of 0 - Same ID the Test event for Solarwinds shows up as (this comes up in the event preview at the bottom so I know there are events to send to the syslog server). Then setting it to Kernal message.

 

Ran test on the applcation log as an error and this comes up in event viewer.

 

I am not seeing it come up in the Syslog console.

 

I can ping the syslog server from the client, firewalls are turned off on all client PCs AND on the server. AV has been uninstalled on one machine. No other blocking software exists.

 

I installed the log generator on the syslog server - set IP to client PC and syslog server IP and it generated message in the syslog console.

Installed log generator on client PC, with same settings, wont show up in Syslog console.

 

Am I doing something stupidly wrong here, ive tried all the forums, everything online, I even set the computer account of the syslog server in the Event Log Readers Group on one of the Windows boxes, no GPOs are blocking connection to port or blocking connection to the event logs themselves.

 

Need to confirm Windows sends logs before we buy this product and at the moment its not playing ball.

 

Any help would be hugely appreciated! Even some netstat type commands as ive tried the netstat -ano command on the client and UDP port isnt showing up anywhere (running the command on the syslog server does show UDP port assigned to syslog and no other process)

 

No error logs in syslog application

 

Regards,

 

Clare Martin

Hardware Health Sensors not available

$
0
0

Hi All,

 

I am evaluating Solarwinds for a client and checking features with the help of Admin guide. The hardware sensors option itself is unavailable under node details for any of the devices (which includes, core switch, vmware vshpere, cameras, UPS, etc. Additionally, the manage hardware sensors option does not list any of the devices and is totally empty. I would really appreciate if anyone could guide me on this. I was trying to see if there are any pre-requisites for these details to appear but I could not find anything in the admin guide.

 

The screenshots attached are from the admin guide; and these details are unavailable for my trial installation.

 

Thanks

Raj

Whitelisted devices being displayed as rouge

$
0
0

I recently upgraded to 2019.4 and now I have a slew of rogue devices that had been previously whitelisted. Even when I make a new whitelist rule it treats those devices as rogues. Has anyone else experienced this? How do I fix it?

DPA Orion Integration Module Question

$
0
0

Hello DPA -

 

I did the 2019.4 update to DPA and was going to run the Orion integration module and then had a question.   All of my other Orion modules are on 2019.2 and DPA is on 2019.4 - if I run this will it upgrade Orion to 2019.4 or will it leave it at 2019.2?  Thanks!

 

NCM Manage Report User Permissions

$
0
0

Hi All

 

I've just had to give a SolarWinds Account group complete admin rights to allow them to create a new NCM report (i.e. Admin->Settings->NCM Settings->Manage Policy Reports->Create New Report)

I initially increased their NCM specific permissions to Engineer and then Administrator but neither of these actions allowed accecss to create new report.

 

Am I missing something somewhere or is the global admin permission really required ?

 

regards

 

Steve

How does one allow a user to view and edit an NPM Report without giving them Administrator Rights?

$
0
0

I want a user to only have the ability to modify and view a report, but I don't seem to be able do it without giving them Admin rights.

 

What, other than Allow Administrator Rights, is required in addition to Manage Reports to view and edit and create a Report?

 

Removing node from Solarwinds when uninstalling agent

$
0
0

I have automated a way for newly provisioned systems to have Solarwinds agents installed using msi and mst files. The systems get added to Solarwinds automatically after the agent installation and configuration is done.

 

Is there a way to reverse it? Ie, is there a way to uninstall agent and remove the node from Solarwinds automatically? If I uninstall the agent, it won't remove it from the node list but will show as down.


Where is the IPSLA thresholds for Maximum?

$
0
0

I found the table for IPSLA thresholds, but it's missing the MAXIMUM values.  Maybe tdanner has an idea on where it is?  

 

 

The table (Orion.IpSla.OperationThresholds) I found only has the WARNING and CRITICAL

Does anyone know where the MAXIMUM value is kept?

 

Thank you

Patch Manager Credential Ring rules and server accounts

$
0
0

What I'm trying to do in Credential Ring is setup a rule where workstations are assigned to one domain user account and servers another. Basically because of security concerns and the wide variety of SolarWinds packages I have I've had to split out the service account for different things. I have a service account that SAM uses to connect to the client workstations and that account seems perfect for Patch Manager to also use BUT servers don’t have that account enabled, and I need to keep it that way. What I want to do is assign a less privileged account to the servers and then add that account to the credential ring for servers only. Another thing I need to add… forget about filtering by AD, that’s a disaster and servers are scattered everywhere. Question is, how do I add a rule that states if a device is located in this WSUS OU then assign this
credential and then … Anyone have documentation on creating a least priv user account for Patch?

Orion Map - Solarwinds

$
0
0

Hi there I am relatively new to Solarwinds and noticing that at various times our links from the core are showing red and then they turn green.

Any idea how I can find out what is causing this?

Unable to upgrade to 2019.4 due to installer error

$
0
0

Hi,

 

I have been trying to perform an upgrade from 12.5 to 2019.4 but I'm encountering a strange issue where installers asks me to stop the following services:

 

I guess I could stop them and try to continue but I do not think that this is a normal behavior. This happens at RabbitMQ installation step (there are some other steps before that an they are completed successfully).

I have restored VM from snapshot few times already but I always get this warring. I believe that installer should stop all required services on it's own.

 

Anyone encountered anything similar? Support case# 00460098.

Regex for Node Names in Dynamic Group Custom Queries

$
0
0

I'm hoping to get some help here on this.

 

I'm trying to build Dynamic Groups to automate template application in SAM (yay!).

In order to meet my requirements for the group, I need to use regular expressions since simple "contains" and "ends with" and so on do not allow for enough sophistication.

A support ticket gave me the answer that regex is supported by Dynamic Groups, and that it is supported for Node Names.

 

I have found many examples on the Thwack forums of regex being used for IPs, but 0 for node names. I've been unsuccessful in creating any regular expressions on node names as well.

Has anyone had any luck with this?

Viewing all 21219 articles
Browse latest View live