Have a table where lower/upper case ASCII characters mean different things. See example:
mysql> create table collation_test(value char(1));
Query OK, 0 rows affected (0.00 sec)
mysql> insert into collation_test(value) values ('T');
Query OK, 1 row affected (0.00 sec)
[A bunch more stuff]
mysql> select * from collation_test;
+-------+
| value |
+-------+
| T |
| T |
| T |
| t |
| t |
| F |
| F |
| f |
| f |
+-------+
9 rows in set (0.00 sec)
mysql> select value,count(*) from collation_test group by value;
+-------+----------+
| value | count(*) |
+-------+----------+
| F | 4 |
| T | 5 |
+-------+----------+
2 rows in set (0.00 sec)
mysql> alter table collation_test CONVERT TO CHARACTER SET latin1
mysql> COLLATE latin1_general_cs;
Query OK, 9 rows affected (0.00 sec)
Records: 9 Duplicates: 0 Warnings: 0
mysql> select value,count(*) from collation_test group by value;
+-------+----------+
| value | count(*) |
+-------+----------+
| F | 2 |
| f | 2 |
| T | 3 |
| t | 2 |
+-------+----------+
4 rows in set (0.00 sec)
Thursday, August 28, 2008
Tuesday, August 19, 2008
MySQL Default values (string / int)
Had a great one today:
Create table mine (id integer);
insert into mine(id) values (0),(1),(2),(3),(0),(1);
select * from mine;
[Nets 0,1,2,3,0,1]
select * from mine where id = "Test";
[0,0]
select * from mine where id != "Test";
[1,2,3,1]
Create table mine (id integer);
insert into mine(id) values (0),(1),(2),(3),(0),(1);
select * from mine;
[Nets 0,1,2,3,0,1]
select * from mine where id = "Test";
[0,0]
select * from mine where id != "Test";
[1,2,3,1]
Labels:MySQL
MySQL
Saturday, August 16, 2008
String / int columns
Found a table of ours:
hashid bigint
value_a varchar(16)
value_b varchar(16)
In MyISAM, yes, this is a dynamically sized table. Turns out, data was only integers. Altered the table to be
hashid bigint
value_a integer
value_b integer
Goes to fixed size rows which is great. Also dropped size from 411M to 350M.
Remember the golden rule:
Optimize your dataset before your hardware...
hashid bigint
value_a varchar(16)
value_b varchar(16)
In MyISAM, yes, this is a dynamically sized table. Turns out, data was only integers. Altered the table to be
hashid bigint
value_a integer
value_b integer
Goes to fixed size rows which is great. Also dropped size from 411M to 350M.
Remember the golden rule:
Optimize your dataset before your hardware...
Attention job candidates
Simple question - I encourage you to add a comment with your own solution in any language:
I have a file with one word per line, N number of lines. Count the number of times your name exists in the file. Use any language you'd like, just give me the number.
cat filename grep name wc -l
fopen, fread, strstr, fclose
load file, select count(*)
As a software engineer/progammer it's not hard, why have the last 8 candidates failed to answer this?!
I have a file with one word per line, N number of lines. Count the number of times your name exists in the file. Use any language you'd like, just give me the number.
cat filename grep name wc -l
fopen, fread, strstr, fclose
load file, select count(*)
As a software engineer/progammer it's not hard, why have the last 8 candidates failed to answer this?!
Linux OOM-killer
Please make sure your OOM-killer settings are configured appropriately.
For us, MySQL is configured for ~40G of memory. A developer had a runaway PERL script that ate ~90G of memory before wiping out our production server. OOM-killer should have tagged that as bad and killed it. Unfortunately, it didn't - the server ran out of memory and cratered. Surprised it didn't sprout legs and go skynet on us.
Didn't help that upon reboot the RAID controller found a bad drive causing check/recover tables ot be run while in degrade mode.
For us, MySQL is configured for ~40G of memory. A developer had a runaway PERL script that ate ~90G of memory before wiping out our production server. OOM-killer should have tagged that as bad and killed it. Unfortunately, it didn't - the server ran out of memory and cratered. Surprised it didn't sprout legs and go skynet on us.
Didn't help that upon reboot the RAID controller found a bad drive causing check/recover tables ot be run while in degrade mode.
Labels:MySQL
Linux
Wednesday, August 13, 2008
Find the bug (Embedded C)
Found this gem a while ago.
void foo (int parameter)
{
int i = 0;
char buffer[64] = {};
int temp = parameter;
for (i=0;i<64;i++);
{
buffer[i] = 0;
}
}
void foo (int parameter)
{
int i = 0;
char buffer[64] = {};
int temp = parameter;
for (i=0;i<64;i++);
{
buffer[i] = 0;
}
}
Labels:MySQL
C
SAN / Direct Attached storage
[update]
Been monkeying in the lab with some Sun/IBM/HP hardware - an EMC CX-80 and a few commodity MSAs. Overall testing: Direct attached is better. Shelves of 16 drives, RAID5, EXT3, 3TB MySQL volume, nominal DB load (50-50 R/W balance) Turns out that the direct attached RAID controller gets to dedicate all memory/throughput to itself. (gee whiz!) The SAN splits its cache among everyone using it (configurable, but besides the point). With 10+ servers slamming the SAN, cache thrashing gets to the point where all the benefits of a SAN are moot.
[Original post]
Open Question:
SAN allows you to use hot backup tools.
DA means you eliminate a single point of failure.
SAN allows you to resize partitions if you failed to realized the actual growth rate.
DA might be a little more battle-tested in small-medium size business world.
I hope to update this in the future.
OLAP Linux filesystem settings
Before investing time in MySQL tweaks/settings -- properly configuring your I/O subsystem is key.
Regardless of underlying filesystem, if you write to many separate tables sequentially, you'll end up with disk fragmentation. Extent based allocation, and pre-alloc patches can serve to help, but increasing sequential disk accesses is critical.
If you're brave, one evil solution could be to place each table on its own partition.
A few other solutions:
1) noatime - a MUST
2) nodiratime - a MUST
3) turn off dir_index (ext3)
4) data=writeback if you have a battery backed RAID controller
5) Get your stride, RAID stripe, OS Page, and Database Page sizes aligned right. This is of huge importance.
6) Use memory/caches/anything to migrate your disks to sequential I/O. Prevent the head from seeking and you can improve throughput.
7) RAID10. One of my first databases utilized RAID6. The overhead regarding writes was abysmal.
8) RAID-N -- Have a hot spare. You don't want rebuilding to wait until you get around to finding a replacement drive.
9) RAID Controller - If you have a split Read/Write cache, allocate as much memory as possible to writes. If you have contiguous blocks, reads can be quick even if you have a cache miss.
10) Memory caching - Linux allows you to tune the dirty memory ratio. Do it! Standard installations are tuned towards desktop / OLTP servers.
11) Swappiness - Oh the religious debate this creates... Do you want your memory allocated for the application/database/data caching, or used so that the guy with a 8hr stale SSH session can have snappy performance?
Regardless of underlying filesystem, if you write to many separate tables sequentially, you'll end up with disk fragmentation. Extent based allocation, and pre-alloc patches can serve to help, but increasing sequential disk accesses is critical.
If you're brave, one evil solution could be to place each table on its own partition.
A few other solutions:
1) noatime - a MUST
2) nodiratime - a MUST
3) turn off dir_index (ext3)
4) data=writeback if you have a battery backed RAID controller
5) Get your stride, RAID stripe, OS Page, and Database Page sizes aligned right. This is of huge importance.
6) Use memory/caches/anything to migrate your disks to sequential I/O. Prevent the head from seeking and you can improve throughput.
7) RAID10. One of my first databases utilized RAID6. The overhead regarding writes was abysmal.
8) RAID-N -- Have a hot spare. You don't want rebuilding to wait until you get around to finding a replacement drive.
9) RAID Controller - If you have a split Read/Write cache, allocate as much memory as possible to writes. If you have contiguous blocks, reads can be quick even if you have a cache miss.
10) Memory caching - Linux allows you to tune the dirty memory ratio. Do it! Standard installations are tuned towards desktop / OLTP servers.
11) Swappiness - Oh the religious debate this creates... Do you want your memory allocated for the application/database/data caching, or used so that the guy with a 8hr stale SSH session can have snappy performance?
Labels:MySQL
Filesystem,
Linux,
OLAP
Tuesday, August 12, 2008
MySQL: How to use lots of memory
So obviously in a database, you want to leverage memory to reduce disk accesses.
InnoDB: Set innodb_buffer_pool_size as large as possible.
MyISAM: The plot thickens. Pre 5.0.52, you could only set key buffers to 4G. Now you can go larger. To leverage memory, I do the following, changed to >4G now that it's been fixed, concept is the same. Note: preloading the caches takes some time (it has to read the whole MYI from disk) Rebooting with this configuration is *expensive*, but server is back to 100% at reboot instead of warming up. Remove LOAD INDEX commands if that's too annoying for your purpose.
file [my.cnf]
# Initialization file
init-file = init.sql
# Global key cache
key_buffer_size = 4G
# Table based key caches:
[tablename1]_cache.key_buffer_size=4G;
[tablename2]_cache.key_buffer_size=4G;
[tablename3]_cache.key_buffer_size=4G;
[tablename4]_cache.key_buffer_size=4G;
file [init.sql]
CACHE INDEX [database1].[tablename1] IN [tablename1]_cache
CACHE INDEX [database1].[tablename2] IN [tablename2]_cache
CACHE INDEX [database1].[tablename3] IN [tablename3]_cache
CACHE INDEX [database1].[tablename4] IN [tablename4]_cache
LOAD INDEX INTO CACHE [database1].[tablename1]
LOAD INDEX INTO CACHE [database1].[tablename2]
LOAD INDEX INTO CACHE [database1].[tablename3]
LOAD INDEX INTO CACHE [database1].[tablename4]
InnoDB: Set innodb_buffer_pool_size as large as possible.
MyISAM: The plot thickens. Pre 5.0.52, you could only set key buffers to 4G. Now you can go larger. To leverage memory, I do the following, changed to >4G now that it's been fixed, concept is the same. Note: preloading the caches takes some time (it has to read the whole MYI from disk) Rebooting with this configuration is *expensive*, but server is back to 100% at reboot instead of warming up. Remove LOAD INDEX commands if that's too annoying for your purpose.
file [my.cnf]
# Initialization file
init-file = init.sql
# Global key cache
key_buffer_size = 4G
# Table based key caches:
[tablename1]_cache.key_buffer_size=4G;
[tablename2]_cache.key_buffer_size=4G;
[tablename3]_cache.key_buffer_size=4G;
[tablename4]_cache.key_buffer_size=4G;
file [init.sql]
CACHE INDEX [database1].[tablename1] IN [tablename1]_cache
CACHE INDEX [database1].[tablename2] IN [tablename2]_cache
CACHE INDEX [database1].[tablename3] IN [tablename3]_cache
CACHE INDEX [database1].[tablename4] IN [tablename4]_cache
LOAD INDEX INTO CACHE [database1].[tablename1]
LOAD INDEX INTO CACHE [database1].[tablename2]
LOAD INDEX INTO CACHE [database1].[tablename3]
LOAD INDEX INTO CACHE [database1].[tablename4]
MySQL Cluster
Serves a very good purpose in an OLTP environment. It's entirely useless for OLAP. Disk backed tables help, but you're still stuck with in-memory indexes. Pull the plug on the rack, cluster takes a looong time to come back.
Data retention, try purging a few thousand rows, you're limited by MaxNoOfConcurrentOperations. Allocating an object for each row to be inserted/deleted/returned is an impossibility for an OLAP system. The widely accepted solution is to chunk deletes (delete from [table] where date < [target] limit [N])
Quite unfortunate due to possiblity of parallelization of large GROUP BYs and other misc data aggregation.
Maybe in 6.0?
Data retention, try purging a few thousand rows, you're limited by MaxNoOfConcurrentOperations. Allocating an object for each row to be inserted/deleted/returned is an impossibility for an OLAP system. The widely accepted solution is to chunk deletes (delete from [table] where date < [target] limit [N])
Quite unfortunate due to possiblity of parallelization of large GROUP BYs and other misc data aggregation.
Maybe in 6.0?
MySQL and I/O Schedulers
Been doing some experimenting in the lab. Appears that deadline works great for a OLTP database load, but CFQ performs better in an OLAP environment.
Will post some empirical results eventually, but here's my $.02:
Let the OS reorder / batch anything headed to disk. Re-order as much as possible to reduce head movement.
Will post some empirical results eventually, but here's my $.02:
Let the OS reorder / batch anything headed to disk. Re-order as much as possible to reduce head movement.
Subscribe to:
Posts (Atom)