16 янв. 2014 г.

How to Configure the Linux Out-of-Memory Killer

This article describes the Linux out-of-memory (OOM) killer and how to find out why it killed a particular process. It also provides methods for configuring the OOM killer to better suit the needs of many different environments.

About the OOM Killer

When a server that's supporting a database or an application server goes down, it's often a race to get critical services back up and running especially if it is an important production system. When attempting to determine the root cause after the initial triage, it's often a mystery as to why the application or database suddenly stopped functioning. In certain situations, the root cause of the issue can be traced to the system running low on memory and killing an important process in order to remain operational.

Want to comment on or discuss this article? See the post on the OTN Garage Blog. Or join us on Facebook.

The Linux kernel allocates memory upon the demand of the applications running on the system. Because many applications allocate their memory up front and often don't utilize the memory allocated, the kernel was designed with the ability to over-commit memory to make memory usage more efficient. This over-commit model allows the kernel to allocate more memory than it actually has physically available. If a process actually utilizes the memory it was allocated, the kernel then provides these resources to the application. When too many applications start utilizing the memory they were allocated, the over-commit model sometimes becomes problematic and the kernel must start killing processes in order to stay operational. The mechanism the kernel uses to recover memory on the system is referred to as the out-of-memory killer or OOM killer for short.

Finding Out Why a Process Was Killed

When troubleshooting an issue where an application has been killed by the OOM killer, there are several clues that might shed light on how and why the process was killed. In the following example, we are going to take a look at our syslog to see whether we can locate the source of our problem. The oracle process was killed by the OOM killer because of an out-of-memory condition. The capital K in Killed indicates that the process was killed with a -9 signal, and this is usually a good sign that the OOM killer might be the culprit.

grep -i kill /var/log/messages*
host kernel: Out of Memory: Killed process 2592 (oracle).

We can also examine the status of low and high memory usage on a system. It's important to note that these values are real time and change depending on the system workload; therefore, these should be watched frequently before memory pressure occurs. Looking at these values after a process was killed won't be very insightful and, thus, can't really help in investigating OOM issues.

[root@test-sys1 ~]# free -lm
             total       used       free     shared    buffers     cached
Mem:           498         93        405          0         15         32
Low:           498         93        405
High:            0          0          0
-/+ buffers/cache:         44        453
Swap:         1023          0       1023

On this test virtual machine, we have 498 MB of low memory free. The system has no swap usage. The -l switch shows high and low memory statistics, and the -m switch puts the output in megabytes to make it easier to read.

[root@test-sys1 ~]# egrep 'High|Low' /proc/meminfo
HighTotal:             0 kB
HighFree:              0 kB
LowTotal:         510444 kB
LowFree:          414768 kB

The same data can be obtained by examining /proc/memory and looking specifically at the high and low values. However, with this method, we don't get swap information from the output and the output is in kilobytes.
Low memory is memory to which the kernel has direct physical access. High memory is memory to which the kernel does not have a direct physical address and, thus, it must be mapped via a virtual address. On older 32-bit systems, you will see low memory and high memory due to the way that memory is mapped to a virtual address. On 64-bit platforms, virtual address space is not needed and all system memory will be shown as low memory.
While looking at /proc/memory and using the free command are useful for knowing "right now" what our memory usage is, there are occasions when we want to look at memory usage over a longer duration. The vmstat command is quite useful for this.
In the example in Listing 1, we are using the vmstat command to look at our resources every 45 seconds 10 times. The -S switch shows our data in a table and the -M switch shows the output in megabytes to make it easier to read. As you can see, something is consuming our free memory, but we are not yet swapping in this example.

[root@localhost ~]# vmstat -SM 45 10
procs -----------memory-------- ---swap-- -----io-- --system-- ----cpu---------
 r  b   swpd  free  buff  cache  si   so   bi   bo   in    cs us  sy  id  wa st
 1  0      0   221   125     42   0    0    0    0   70     4  0   0  100  0  0
 2  0      0   192   133     43   0    0  192   78  432  1809  1  15   81   2 0
 2  1      0    85   161     43   0    0  624  418 1456  8407  7  73    0  21 0
 0  0      0    65   168     43   0    0  158  237  648  5655  3  28   65   4 0
 3  0      0    64   168     43   0    0    0    2 1115 13178  9  69   22   0 0
 7  0      0    60   168     43   0    0    0    5 1319 15509 13  87    0   0 0
 4  0      0    60   168     43   0    0    0    1 1387 15613 14  86    0   0 0
 7  0      0    61   168     43   0    0    0    0 1375 15574 14  86    0   0 0
 2  0      0    64   168     43   0    0    0    0 1355 15722 13  87    0   0 0
 0  0      0    71   168     43   0    0    0    6  215  1895  1   8   91   0 0

Listing 1
The output of vmstat can be redirected to a file using the following command. We can even adjust the duration and the number of times in order to monitor longer. While the command is running, we can look at the output file at any time to see the results.
In the following example, we are looking at memory every 120 seconds 1000 times. The & at the end of the line allows us to run this as a process and regain our terminal.

vmstat -SM 120 1000 > memoryusage.out &

For reference, Listing 2 shows a section from the vmstat man page that provides additional information about the output the command provides. This is the memory-related information only; the command provides information about both disk I/O and CPU usage as well.

   Memory
       swpd: the amount of virtual memory used.
       free: the amount of idle memory.
       buff: the amount of memory used as buffers.
       cache: the amount of memory used as cache.
       inact: the amount of inactive memory. (-a option)
       active: the amount of active memory. (-a option)

   Swap
       si: Amount of memory swapped in from disk (/s).
       so: Amount of memory swapped to disk (/s).

Listing 2
There are a number of other tools available for monitoring memory and system performance for investigating issues of this nature. Tools such as sar (System Activity Reporter) and dtrace (Dynamic Tracing) are quite useful for collecting specific data about system performance over time. For even more visibility, the dtrace stability and data stability probes even have a trigger for OOM conditions that will fire if the kernel kills a process due to an OOM condition. More information about dtrace and sar is included in the "See Also" section of this article.
There are several things that might cause an OOM event other than the system running out of RAM and available swap space due to the workload. The kernel might not be able to utilize swap space optimally due to the type of workload on the system. Applications that utilize mlock() or HugePages have memory that can't be swapped to disk when the system starts to run low on physical memory. Kernel data structures can also take up too much space exhausting memory on the system and causing an OOM situation. Many NUMA architecture–based systems can experience OOM conditions because of one node running out of memory triggering an OOM in the kernel while plenty of memory is left in the remaining nodes. More information about OOM conditions on machines that have the NUMA architecture can be found in the "See Also" section of this article.

Configuring the OOM Killer

The OOM killer on Linux has several configuration options that allow developers some choice as to the behavior the system will exhibit when it is faced with an out-of-memory condition. These settings and choices vary depending on the environment and applications that the system has configured on it.
Note: It's suggested that testing and tuning be performed in a development environment before making changes on important production systems.
In some environments, when a system runs a single critical task, rebooting when a system runs into an OOM condition might be a viable option to return the system back to operational status quickly without administrator intervention. While not an optimal approach, the logic behind this is that if our application is unable to operate due to being killed by the OOM killer, then a reboot of the system will restore the application if it starts with the system at boot time. If the application is manually started by an administrator, this option is not beneficial.
The following settings will cause the system to panic and reboot in an out-of-memory condition. The sysctl commands will set this in real time, and appending the settings to sysctl.conf will allow these settings to survive reboots. The X for kernel.panic is the number of seconds before the system should be rebooted. This setting should be adjusted to meet the needs of your environment.

sysctl vm.panic_on_oom=1
sysctl kernel.panic=X
echo "vm.panic_on_oom=1" >> /etc/sysctl.conf
echo "kernel.panic=X" >> /etc/sysctl.conf

We can also tune the way that the OOM killer handles OOM conditions with certain processes. Take, for example, our oracle process 2592 that was killed earlier. If we want to make our oracle process less likely to be killed by the OOM killer, we can do the following.

echo -15 > /proc/2592/oom_adj

We can make the OOM killer more likely to kill our oracle process by doing the following.

echo 10 > /proc/2592/oom_adj

If we want to exclude our oracle process from the OOM killer, we can do the following, which will exclude it completely from the OOM killer. It is important to note that this might cause unexpected behavior depending on the resources and configuration of the system. If the kernel is unable to kill a process using a large amount of memory, it will move onto other available processes. Some of those processes might be important operating system processes that ultimately might cause the system to go down.

echo -17 > /proc/2592/oom_adj

We can set valid ranges for oom_adj from -16 to +15, and a setting of -17 exempts a process entirely from the OOM killer. The higher the number, the more likely our process will be selected for termination if the system encounters an OOM condition. The contents of /proc/2592/oom_score can also be viewed to determine how likely a process is to be killed by the OOM killer. A score of 0 is an indication that our process is exempt from the OOM killer. The higher the OOM score, the more likely a process will be killed in an OOM condition.
The OOM killer can be completely disabled with the following command. This is not recommended for production environments, because if an out-of-memory condition does present itself, there could be unexpected behavior depending on the available system resources and configuration. This unexpected behavior could be anything from a kernel panic to a hang depending on the resources available to the kernel at the time of the OOM condition.

sysctl vm.overcommit_memory=2
echo "vm.overcommit_memory=2" >> /etc/sysctl.conf

For some environments, these configuration options are not optimal and further tuning and adjustments might be needed. Configuring HugePages for your kernel can assist with OOM issues depending on the needs of the applications running on the system.

##My solve is guard mysql by add line to crontab
0 * * * * root echo -15 > /proc/$(cat /var/run/mysqld/mysqld.pid)/oom_adj

10 янв. 2014 г.

Google Zeitgeist | Here's to 2013

Цитата #421587

Delete old files in folder

Delete old then 90 days
$ find /var/log/* -mtime +90 -exec rm {} \;

Gzip all files recursively and separate

find . -type f ! -name '*.gz' ! -name '*.zip' ! -name '*.tgz' -exec gzip "{}" \;

9 янв. 2014 г.

Should MySQL have its timezone set to UTC?

It seems that it does not matter what timezone is on the server as long as you have the time set right for the current timezone, know the timezone of the datetime columns that you store, and are aware of the issues with daylight savings time.

On the other hand if you have control of the timezones of the servers you work with then you can have everything set to UTC internally and never worry about timezones and DST.

Here are some notes I collected of how to work with timezones as a form of cheatsheet for myself and others which might influence what timezone the person will choose for his/her server and how he/she will store date and time.

MySQL Timezone Cheatsheet

Notes:

Changing the timezone will not change the stored datetime or timestamp, but it will show select a different datetime from timestamp columns
UTC does not use daylight savings time, GMT (the region) does, GMT (the timezone) does not (GMT is also confusing the definition of seconds which is why UTC was invented).
Warning! UTC has leap seconds, these look like '2012-06-30 23:59:60' and can be added randomly, with 6 months prior notice, due to the slowing of the earths rotation
Warning! different regional timezones might produce the same datetime value due to daylight savings time
The timestamp column only supports dates 1970-01-01 00:00:01 to 2038-01-19 03:14:07 UTC
Internally a MySQL timestamp column is stored as UTC but when selecting a date MySQL will automatically convert it to the current session timezone.

When storing a date in a timestamp, MySQL will assume that the date is in the current session timezone and convert it to UTC for storage.
MySQL can store partial dates in datetime columns, these look like "2013-00-00 04:00:00"
MySQL stores "0000-00-00 00:00:00" if you set a datetime column as NULL, unless you specifically set the column to allow null when you create it.

To select a timestamp column in UTC format

no matter what timezone the current MySQL session is in:

SELECT 
CONVERT_TZ(`timestamp_field`, @@session.time_zone, '+00:00') AS `utc_datetime` 
FROM `table_name`

You can also set the sever or global or current session timezone to UTC and then select the timestamp like so:

SELECT `timestamp_field` FROM `table_name`

To select the current datetime in UTC:

SELECT UTC_TIMESTAMP();
SELECT UTC_TIMESTAMP;
SELECT CONVERT_TZ(NOW(), @@session.time_zone, '+00:00');

To select the current datetime in the session timezone

SELECT NOW();
SELECT CURRENT_TIMESTAMP;
SELECT CURRENT_TIMESTAMP();

To select the timezone that was set when the server launched

SELECT @@system_time_zone;

Returns "MSK" or "+04:00" for Moscow time for example, there is (or was) a MySQL bug where if set to a numerical offset it would not adjust the Daylight savings time

To get the current timezone

SELECT TIMEDIFF(NOW(), UTC_TIMESTAMP);

It will return 02:00:00 if your timezone is +2:00.

To get the current UNIX timestamp:

SELECT UNIX_TIMESTAMP(NOW())

To get the timestamp column as a UNIX timestamp

SELECT UNIX_TIMESTAMP(`timestamp`) FROM `table_name`

To get a UTC datetime column as a UNIX timestamp

SELECT UNIX_TIMESTAMP(CONVERT_TZ(`utc_datetime`, '+00:00', @@session.time_zone)) FROM `table_name`

Get a current timezone datetime from a UNIX timestamp

SELECT FROM_UNIXTIME(`unix_timestamp_int`) FROM `table_name`

Get a UTC datetime from a UNIX timestamp

SELECT CONVERT_TZ(FROM_UNIXTIME(`unix_timestamp_int`), @@session.time_zone, '+00:00') 
FROM `table_name`

There are 3 places where the timezone might be set in MySQL:

Note: A timezone can be set in 2 formats:

an offset from UTC: '+00:00', '+10:00' or '-6:00'
as a named time zone: 'Europe/Helsinki', 'US/Eastern', or 'MET'

Named time zones can be used only if the time zone information tables in the mysql database have been created and populated.

in the file "my.cnf"

default_time_zone='+00:00'

@@global.time_zone variable

To see what value they are set to

SELECT @@global.time_zone;

To set a value for it use either one:

SET GLOBAL time_zone = '+8:00';
SET GLOBAL time_zone = 'Europe/Helsinki';
SET @@global.time_zone='+00:00';

@@session.time_zone variable

SELECT @@session.time_zone;

To set it use either one:

SET time_zone = 'Europe/Helsinki';
SET time_zone = "+00:00";
SET @@session.time_zone = "+00:00";

both "@@global.time_zone variable" and "@@session.time_zone variable" might return "SYSTEM" which means that they use the timezone set in "my.cnf".

For timezone names to work you must setup your timezone information tables need to be populated:http://dev.mysql.com/doc/refman/5.1/en/time-zone-support.html

Note: you can not do this as it will return NULL:

SELECT 
CONVERT_TZ(`timestamp_field`, TIMEDIFF(NOW(), UTC_TIMESTAMP), '+00:00') AS `utc_datetime` 
FROM `table_name`

16 янв. 2014 г.