mysqlguy.net

Feed aggregator

How Linux iostat computes its results

Planet MySQL - January 9, 2010 - 8:53pm

iostat is one of the most important tools for measuring disk performance, which of course is very relevant for database administrators, whether your chosen database is Postgres, MySQL, Oracle, or anything else that runs on GNU/Linux. Have you ever wondered where statistics like await (average wait for the request to complete) come from? If you look at the disk statistics the Linux kernel makes available through files such as /proc/diskstats, you won’t see await there. How does iostat compute await? For that matter, how does it compute the average queue size, service time, and utilization? This blog post will show you how that’s computed.

First, let’s look at the fields in /proc/diskstats. The order and location varies between kernels, but the following applies to 2.6 kernels. For reads and writes, the file contains the number of operations, number of operations merged because they were adjacent, number of sectors, and number of milliseconds spent. Those are available separately for reads and writes, although iostat groups them together in some cases. Additionally, you can find the number of operations in progress, total number of milliseconds during which I/Os were in progress, and the weighted number of milliseconds spent doing I/Os. Those are not available separately for reads and writes.

The last one is very important. The field showing the number of operations in progress is transient — it shows you the instantaneous value, and this “memoryless” property means you can’t use it to infer the number of I/O operations that are in progress on average. But the last field has memory, because it is defined as follows:

Field 11 — weighted # of milliseconds spent doing I/Os This field is incremented at each I/O start, I/O completion, I/O merge, or read of these stats by the number of I/Os in progress (field 9) times the number of milliseconds spent doing I/O since the last update of this field. This can provide an easy measure of both I/O completion time and the backlog that may be accumulating.

So the field indicates the total number of milliseconds that all requests have been in progress. If two requests have been waiting 100ms, then 200ms is added to the field. And thus it records what happened over the duration of the sampling interval, not just what’s happening at the instant you look at the file. We’ll come back to that later.

Now, given two samples of I/O statistics and the time elapsed between them, we can easily compute everything iostat outputs in -dx mode. I’ll take them slightly out of order to reflect how the computations are done internally.

rrqm/s is merely the incremental merges divided by the number of seconds elapsed.
wrqm/s is similarly simple, and r/s, w/s, rsec/s, and wsec/s are too.
avgrq-sz is the number of sectors divided by the number of I/O operations.
avgqu-sz is computed from the last field in the file — the one that has “memory” — divided by the milliseconds elapsed. Hence the units cancel out and you just get the average number of operations in progress during the time period. The name (short for “average queue size”) is a little bit ambiguous. This value doesn’t show how many operations were queued but not yet being serviced — it shows how many were either in the queue waiting, or being serviced. The exact wording of the kernel documentation is “…as requests are given to appropriate struct request_queue and decremented as they finish.”
%util is the total time spent doing I/Os, divided by the sampling interval. This tells you how much of the time the device was busy, but it doesn’t really tell you whether it’s reaching its limit of throughput, because the device could be backed by many disks and hence capable of multiple I/O operations simultaneously.
await is the total time for all I/O operations summed, divided by the number of I/O operations completed.
svctm is the most complex to derive. It is the utilization divided by the throughput. You saw utilization above; the throughput is the number of I/O operations in the time interval.

Although the computations and their results seem both simple and cryptic, it turns out that you can derive a lot of information from the relationship between these various numbers. This is one of those tools where a few lines of code have a surprising amount of meaning, which is left for the reader to understand. I’ll get more into that in the future.

How to find per-process I/O statistics on Linux Newer Linu
Recap of Southeast Linux Fest 2009 Last weeke
Wikipedia’s concensus: Linux is an operating system My brother

Related posts brought to you by Yet Another Related Posts Plugin.

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Getting around optimizer limitations with an IN() list

Planet MySQL - January 9, 2010 - 6:21pm

There was a discussion on LinkedIn one month ago that caught my eye:

Database search by "within x number of miles" radius?

Anyone out there created a zipcode database and created a "search within x numer of miles" function ?
Thankful for any tips you can throw my way..

A few people commented that some solutions wouldn't scale. To understand why these sorts of geographic search queries are problematic in MySQL, it's best to show some execution plans on dummy data:

PLAIN TEXT SQL:

EXPLAIN SELECT * FROM coordinates FORCE INDEX (x_y_col_a) WHERE x BETWEEN 30 AND 40
AND y BETWEEN 50 AND 60 AND col_a = 'set1';
+----+-------------+-------------+-------+---------------+-----------+---------+------+------+-------------+
| id | select_type | TABLE | type | possible_keys | KEY | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+-----------+---------+------+------+-------------+
| 1 | SIMPLE | coordinates | range | x_y_col_a | x_y_col_a | 38 | NULL | 4032 | USING WHERE |
+----+-------------+-------------+-------+---------------+-----------+---------+------+------+-------------+
1 row IN SET (0.00 sec)
EXPLAIN SELECT * FROM coordinates FORCE INDEX (x_y_col_a) WHERE x BETWEEN 30 AND 40;
+----+-------------+-------------+-------+---------------+-----------+---------+------+------+-------------+
| id | select_type | TABLE | type | possible_keys | KEY | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+-----------+---------+------+------+-------------+
| 1 | SIMPLE | coordinates | range | x_y_col_a | x_y_col_a | 3 | NULL | 4032 | USING WHERE |
+----+-------------+-------------+-------+---------------+-----------+---------+------+------+-------------+
1 row IN SET (0.01 sec)
SELECT count(*) FROM coordinates FORCE INDEX (x_y_col_a) WHERE x BETWEEN 30 AND 40
AND y BETWEEN 50 AND 60 AND col_a = 'set1';
+----------+
| count(*) |
+----------+
| 1 |
+----------+
1 row IN SET (0.00 sec)
SELECT count(*) FROM coordinates FORCE INDEX (x_y_col_a) WHERE x BETWEEN 30 AND 40;
+----------+
| count(*) |
+----------+
| 1664 |
+----------+
1 row IN SET (0.01 sec)

Did you notice that we estimate just as many rows on the first EXPLAIN as the second one? That doesn't make any sense! The index covers x,y and col_a and should be eliminating a lot of searching, since there is only one row which meets this condition!

The reason for this is simply a missing feature of the MySQL optimizer - and it has to do with using x BETWEEN 30 and 40 (and it's also true with x >= 30 AND x <= 40). Using a range like this prevents us from using the rest of the index. There is a workaround, but it's not pretty:

PLAIN TEXT SQL:

EXPLAIN SELECT * FROM coordinates WHERE x IN
(30.30,30.61,31.18,31.26,31.72,32.11,32.25,32.30,32.42,32.91,33.27,
33.69,33.79,33.93,34.62,34.78,35.10,35.41,36.62,36.93,37.17,38.93,39.20,
39.56,39.84,39.87) AND y IN (59.58,56.81,57.27,54.14,56.43,51.87,54.59,
59.56,57.42,54.13,56.79,59.45) AND col_a = 'set1';
+----+-------------+-------------+-------+---------------+-----------+---------+------+------+-------------+
| id | select_type | TABLE | type | possible_keys | KEY | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+-----------+---------+------+------+-------------+
| 1 | SIMPLE | coordinates | range | x_y_col_a | x_y_col_a | 38 | NULL | 312 | USING WHERE |
+----+-------------+-------------+-------+---------------+-----------+---------+------+------+-------------+
1 row IN SET (0.00 sec)

The ugliest thing about this, is that in real life you wouldn't know all your possible values of X or Y, and so you may end up with a very big IN list. The workaround to this, is to create steppings of the value X and Y that we can use for indexes:

PLAIN TEXT SQL:

ALTER TABLE coordinates ADD x_floor INT NOT NULL, ADD y_floor INT NOT NULL, DROP INDEX x_y_col_a,
ADD INDEX x_floor_y_floor_col_a (x_floor, y_floor, col_a);
UPDATE coordinates SET x_floor = FLOOR(x), y_floor = FLOOR(y);
EXPLAIN SELECT * FROM coordinates WHERE x_floor IN (30,31,32,33,34,35,36,37,38,39,40)
AND y_floor IN (50,51,52,53,54,55,56,57,58,59,60) AND col_a = 'set1'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
TABLE: coordinates
type: range
possible_keys: x_floor_y_floor_col_a
KEY: x_floor_y_floor_col_a
key_len: 40
ref: NULL
rows: 121
Extra: USING WHERE
1 row IN SET (0.00 sec)

Fantastic! The only remaining problem with this query is that it's not quite identical to our original. In this query 60.79 will be floored to 60 (and erroneously included in our results):

PLAIN TEXT SQL:

SELECT * FROM coordinates WHERE x_floor IN (30,31,32,33,34,35,36,37,38,39,40)
AND y_floor IN (50,51,52,53,54,55,56,57,58,59,60) AND col_a = 'set1';
+-----+-------+-------+-------+-------+---------+---------+
| id | x | y | col_a | col_b | x_floor | y_floor |
+-----+-------+-------+-------+-------+---------+---------+
| 144 | 33.79 | 54.59 | set1 | NULL | 33 | 54 |
| 38 | 39.20 | 60.79 | set1 | NULL | 39 | 60 |
+-----+-------+-------+-------+-------+---------+---------+
2 rows IN SET (0.00 sec)

However, this is a quick fix by re-including the original WHERE conditions (we are just no longer using an index on them):

PLAIN TEXT SQL:

EXPLAIN SELECT * FROM coordinates WHERE x_floor IN (30,31,32,33,34,35,36,37,38,39,40)
AND y_floor IN (50,51,52,53,54,55,56,57,58,59,60)
AND col_a = 'set1' AND x BETWEEN 30 AND 40 AND y BETWEEN 50 AND 60\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
TABLE: coordinates
type: range
possible_keys: x_floor_y_floor_col_a
KEY: x_floor_y_floor_col_a
key_len: 40
ref: NULL
rows: 121
Extra: USING WHERE
1 row IN SET (0.00 sec)
SELECT * FROM coordinates WHERE x_floor IN (30,31,32,33,34,35,36,37,38,39,40)
AND y_floor IN (50,51,52,53,54,55,56,57,58,59,60)
AND col_a = 'set1' AND x BETWEEN 30 AND 40 AND y BETWEEN 50 AND 60;
+-----+-------+-------+-------+-------+---------+---------+
| id | x | y | col_a | col_b | x_floor | y_floor |
+-----+-------+-------+-------+-------+---------+---------+
| 144 | 33.79 | 54.59 | set1 | NULL | 33 | 54 |
+-----+-------+-------+-------+-------+---------+---------+
1 row IN SET (0.00 sec)

Conclusion:
My examples were only on a small amount of data (16 000 rows) that fitted in memory, but the original query would have full table scanned if I didn't use a FORCE INDEX hint. Add more data, and if X can't filter out enough rows by itself this can create a real problem.

Workarounds are all very good, but they also make applications more difficult to maintain. If you really want to do these types of queries, you should give Sphinx a try.

Entry posted by Morgan Tocker | 3 comments

Add to: | | | |

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Dennis On the Road to Recovery

Planet MySQL - January 9, 2010 - 12:59pm

(Click on the image to enlarge)

Dennis Wolf, former MySQL CFO, has been undergoing Plasmapheresis treatment in the last week and despite a mild setback due to an infection, he will be checking out of the hospital today to continue rehab as an outpatient. Dennis reports that he has increased mobility in his leg and that the test for Devic's Disease and Multiple Sclerosis has come back negative. So there's a bit of a mystery as to what has caused this NMO flareup, but the fact that it's not Devic's disease is good news as it means a recurrence is less likely.

Dennis will be continuing treatment as an outpatient for the next several weeks and will also be getting a second opinion from the experts up at UCSF.

Marten Mickos put together a couple of nice posters that were hanging in Dennis' hospital room including the one above featuring the MySQL executive team (known informally as the 4th Box Club) and a quote from James Stockdale:

“You must never confuse faith that you will prevail in the end — which you can never afford to lose — with the discipline to confront the most brutal facts of your current reality, whatever they might be.”

Dennis, we are thrilled to hear of your progress!

Feel free to add your comments for Dennis below.

Wikipedia: James Stockdale, Devic's Disease, Plasmapheresis

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

FOSDEM 2010: Python sneaks into the MySQL DevRoom!

Planet MySQL - January 9, 2010 - 8:31am

FOSDEM 2010, Sunday 7 February, the MySQL Developer Room packed with 12 talks! And this year we serve Python just before the lunch break.

In 20 minutes I'll try to give an overview of the drivers currently available for connecting your Python applications with MySQL. Incase you wonder, this will not evolve around MySQL Connector/Python alone!

We'll also go over some frameworks and tools like SQLAlchemy.

20 minutes, it's not much, but should be enough. I hope to get a similar talk accepted for the MySQL Conference&Expo 2010.
PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Shocked and Stunned (that code exists and does work)

Planet MySQL - January 8, 2010 - 8:06pm

#define READ_ALL 1 /* openfrm: Read all parameters */ #define EXTRA_RECORD 8 /* Reservera plats f|r extra record */

and later on….

if (prgflag & (READ_ALL+EXTRA_RECORD)) records++;

Feel free to think about that for a second.

(I have an urge to add this to questions asked in a job interview…)

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Scaling to 20k Transactions per Second with Open Source - MySQL, Java, curl, PHP

Planet MySQL - January 8, 2010 - 7:21pm

I'll be giving a web presentation Jan. 19th 2010. If you would like to check it out please do!

Scaling to 20k Transactions per Second with Open Source - MySQL, Java, curl, PHP
PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Announcing the tracks of the "MySQL & Friends" Developer Room at FOSDEM 2010

Planet MySQL - January 8, 2010 - 5:55pm

We are happy to announce the selected sessions for our "MySQL and Friends" developer room at FOSDEM 2010 in Brussels, Belgium. It will take place on Sunday, 7th of February from 9:00-17:00 in Room AW1.121.

In total, we received 20 submissions from 15 speakers. We'd like to thank them very much for their great proposals!

As we only have 12 speaking slots (20 mins each) available that day, we first needed to perform a selection process. Unfortunately there wasn't enough time to perform a full-blown voting process that involved the community at large. Since we didn't want to do this in a completely closed committee, we decided to involve all speakers that submitted a talk in this.

After this voting process, the final candidates are (ordered by last name):

Speaker: David Axmark
Title: Drizzle, A MySQL fork for the Web
Abstract:

The Drizzle project is building a database optimized for Cloud and Net applications. It is being designed for massive concurrency on modern multi-cpu/core architecture. The code is originally derived from MySQL. The code has been simplified by removing many of the original features (Stored procedures, triggers, views, some types etc). Other features are being moved from the core to plugins.

Speaker: Piotr Biel
Title: Multi-Master Replication Manager for MySQL
Abstract:

Introduction to MySQL high availability with MMM
MMM architecture overview
pros and cons
examples of usage

Speaker: Ronald Bradford
Title: 10x performance improvements - A case study
Abstract:

Taking a client from a 700ms (at times 1-2 seconds) page load time to a consistent 60ms is a great success story for improvement in database performance. In this presentation we will outline the steps taken and what can be applied to any website as one model to use in evaluation of your website.

While part of a longer talk, we will be reviewing the highlights including:

Identify what is truly slow and high volume queries
Choosing the right indexes including unique indexes, concatenated and partical indexes
Choosing the right storage engines (MyISAM, InnoDB and Memory)
Removing load including synchronous/real-time queries, master load, locking and sharding
The best performance improvement for a SQL statement is to eliminate it
Quantifying improvements made

The 10 Points are:

Monitor, Monitor, Monitor
Identify problem SQL
Analyze problem SQL
Indexes
Offloading master load
Improving SQL
Storage engines
Caching
Sharding
Handling database maintenance

Bonus: Front end improvements

Speaker: Kris Buytaert
Title: MySQL HA overview
Abstract:

Database High Availability is often the core component to build a higly available infrastructure.

This presentation will guide you trough the different options available with their advantages and disadvantages when choosing a method to setup a Highly available MySQL setup.

We'll cover MySQL Cluster, MySQL DRBD, MultiMaster based HA setups and different others including a look at how to integrate them with the Operating System.

Speaker: Stéphane Combaudon
Title: Correcting replication data drift with Maatkit
Abstract:

Replication with MySQL is very easy to set up but also very easy to break. As soon as inconsistencies are introduced, the master and the slaves start being desynchronized - and most often, you have no way to be quickly and efficiently aware of the problem as MySQL offers no built-in mechanism to check data consistency between servers.

Fortunately Maatkit has a neat solution with 2 nice scripts: mk-table-checksum, which is a tool to detect inconsistencies between a master and its slaves and mk-table-sync, which can correct automatically these problems detected with mk-table-checksum.

This session will show you first how to use both tools depending on your replication setup (master-slaves or master-master) and then how to get the most from the multiple options that are available.

Speaker: Marc Delisle
Title: State of phpMyAdmin
Abstract:

phpMyAdmin is still evolving. New features in version 3.3 include changes tracking, replication support and database synchronization. Other lesser known features like data transformation and graphical relational manager will be covered as well.

Speaker: Seppo Jaakola
Title: Galera Replication for MySQL
Abstract:

This session will present the current state of Galera replication project. The presentation will contain elements like:

High level overview of Galera replication method
Contents of recent MySQL/Galera 0.7 release (fully open source)
Feedback and experiences from the field
New benchmark results
Short term road map, 0.8 development status

Speaker: Vladimir Kolesnikov
Title: Effective SSD for your Database - a PBXT Primer
Abstract:

In this session it will shown how to efficiently combine SSD and HDD storage for the benefit of performance of a MySQL database. Examples involving the PBXT storage engine will be demonstrated.

Speaker: Giuseppe Maxia
Title: Sharding for the masses
Abstract:

The Spider storage engine, a plugin for MySQL 5.1 and later, is an extension of partitioning. Using this engine, the user can deal transparently with multiple backends in the server layer. This means that the data is accessible from any application without code changes. This lecture will briefly introduce MySQL partitioning, and then shows how to create and use the Spider engine, with some practical examples. This talk covers the latest version of the Spider and the Vertical Partition engines, including the related technology of UDFs that come with the engines. Practical instructions of how to install and use these technologies will be provided during the session.

Speaker: Kristian Nielsen
Title: Beyond MySQL GA: patches, storage engines, forks, and pre-releases
Abstract:

There is more to the MySQL server than MySQL 5.1 GA.
There are a lot of patches available that are not included in the official MySQL source tree. There are several new or enhanced storage engines. There are a handful of branches of the MySQL source code maintained by community members. And there are official MySQL milestone releases and snapshots. But to benefit from all of this, one has to know that they exist, and know how to use them.
In my talk, I will first give an overview of what community enhancements of various kinds are available. I will then go through the options for using these enhancements, including using pre-build binaries from various sources, building from alternative source trees, building storage engine plugins, and patching upstream MySQL sources.Speaker: Mikael Ronström
Title: The new MySQL thread pool design
Abstract: This session will present a new thread pool design worked out in the last few months. The design will be presented together with its pros and possible cons. Benchmark data validating its relevance to users will also be presented.

The main benefits of a thread pool design is that users can now have thousands of connections to the MySQL Server in a scalable manner. It also provides a performance which is more independent of the number of concurrently active connections than previously.

Speaker: Geert Vanderkelen
Title: MySQL and Python: an overview
Abstract:

There has been lots of chatter about other languages, but not much about Python. This talk will give an overview of what is available to connect to MySQL, and also introduce MySQL Connector/Python.

These talks will now be put in the room schedule on the MySQL Forge Wiki. We look forward to having these sessions! If you happen to be a FOSDEM, please stop by at our Developer Room!

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Flooding the World with Kindness

Yodel Anecdotal - January 8, 2010 - 5:18pm

By Connie Chan, Manager, Yahoo! for Good

In December, Yahoo! launched our You In? program at kindness.yahoo.com to encourage people around the world to spread happiness by doing random acts of kindness. Thanks to you, our big-hearted Yahoo! community who helped spread the word, over 315,000 people from 11 countries updated their Yahoo! status with good deeds, and more than 500 stories were posted to the You In? Flickr Group.

Check out some of our favorites –

: Padmaja bought books for students in a rural government school in Bangalore, India

Kastle handed out reusable shopping bags and included a handwritten holiday card and $5 bill at a Farmer’s Market in Santa Monica, California

: Edgar bought a set of books for a complete stranger in Brooklyn, New York

Inspired by amazing stories like these, Yahoo! also did some of our own random acts of kindness.

When we heard that Bill anonymously paid for a family’s overdue water bills , Yahoo! paid the heating bills for 12 families from a small town in Kentucky who were at risk of having their heat turned off in the winter.

Yahoos hand delivered 50 hot meals to farm labor families who didn’t have transportation to get to the food bank in Quincy, Washington.

And on the busiest travel day of the year, Yahoo! went to the San Jose and San Francisco International Airports to pay for people’s baggage fees.

But the kindness didn’t stop there! We also sent 50 kids from a Boys & Girls Club in Washington D.C. to the Nutcracker Ballet, took low income seniors in elderly care facilities on holiday shopping sprees in San Diego and Miami, gave DVD collections to children’s hospitals in Portland and Cincinnati, and more.

So our new year’s resolution is to keep this warm fuzzy holiday spirit alive by spreading ripples of kindness through our year-round Purple Acts of Kindness program. You In?

Categories: Yahoo

How to SHOW PROCESSLIST Every .5 Seconds

Planet MySQL - January 8, 2010 - 4:33pm

Yes, it has been a while since I last posted, so we'll see how this year goes...Just the other evening I came across something that I should have done much sooner. Run SHOW PROCESSLIST every second without hitting "ENTER" repetitively.The Setup1) Create a temporary user with a non-sensitive password on localhost with SUPER privs. Yes, this crazy, but I did say *temporary*, right? mysql> GRANT
PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Repeatable read versus read committed for InnoDB

Planet MySQL - January 8, 2010 - 3:57pm

There is a lot of documentation available on transaction isolation levels for InnoDB including the InnoDB transaction model, a description of the isolation levels, notes on locks, locks set by statements and notes on consistent non-locking reads. There is another issue that has not been documented. The code that creates a read view (snapshot) for InnoDB transactions is a source of mutex contention on kernel_mutex. This code is run once per statement for read-committed and once per transaction for repeatable-read. The function read_view_open_now is run to create a snapshot. It copies the list of uncommitted transactions into a per-transaction data structure. Memory is allocated from the per-transaction heap to store the copied data and that allocation might require a call to malloc. All of this work is done while holding kernel_mutex and other threads are unlikely to get much work done in InnoDB when kernel_mutex is locked. Subscribe to feature request 49169 if you are interested in this. I am not sure if there is an easy fix. Work can be done to reduce the chance of allocating memory while copying the open transaction list. But splitting kernel_mutex into several locks might be hard. When repeatable-read is used there are fewer calls to read_view_open_now and there is less contention on kernel_mutex. I ran sysbench to measure the difference in performance. These are results from sysbench oltp using repeatable-read versus read-committed with MySQL 5.0.84. The numbers are transactions per second for 1, 2, 4, 8, 16, 32, 64, 128, 256 and 512 concurrent users. The MySQL server and sysbench clients ran on the same 8-core x86 server:1130 1106 1074 1066 1039 1004 repeatable-read1094 1075 1046 1011 957 886 read-committed The sysbench command line: sysbench --test=oltp --oltp-table-size=2000000 --max-time=60 \ --max-requests=0 --mysql-table-engine=innodb --db-ps-mode=disable \ --mysql-engine-trx=yes --oltp-read-only --oltp-dist-type=uniform \ --oltp-range-size=1000 --num-threads=$n --seed-rng=1 run my.cnf settings: [mysqld] innodb_buffer_pool_size=2000M innodb_log_file_size=100M innodb_flush_method=O_DIRECT innodb_thread_concurrency=0 max_connections=2000 table_cache=2000 innodb_max_dirty_pages_pct=80 # use one of these # transaction_isolation = READ-COMMITTED # transaction_isolation = REPEATABLE-READ
PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Log Buffer #174: a Carnival of the Vanities for DBAs

Planet MySQL - January 8, 2010 - 2:45pm

Happy New Year to all our readers! Welcome to 2010 and the 174th edition of Log Buffer, the weekly review of database blogs.

MySQL

The MySQL ’sphere since the holidays has been thick with posts on the matter of Oracle’s purchase of Sun, and thereby of MySQL. And in particular, there’s been a lot of talk about MySQL founder Monty Widenius’s response. I call all of this the . . .

Monty My-Thon

On the 28th of December, Monty framed the issue thus: Help keep the Internet free.

Singer Wang of Pythian, in reply, offers his perspective on GPL/ASL/BSD License Misconceptions and MySQL.

On Poo-tee-weet, Lukas Kahwe Smith is heard to say, Come on Monty . . . “What on earth is Monty . . . thinking? How can you spin around 180 and expect to come of believable? How can suddenly the GPL be the wrong choice? How can suddenly OSS depend on proprietary sales?”

On the WireLust blog, Terrence Curran writes, Monty Widenius is trying to regain control of MySQL and why this is bad for OSS.

Kristian Nielsen shares some Oracle speculations, stating, “I think it is basically a matter of obtaining control over MySQL.”

Antony Curtis throws in his two cents: “The topic of today is [Monty's] ‘Save MySQL’ campaign and how I believe it is unnecessary. . . . In fact, I believe that it could be harmful.”

All that aside, things keep rolling, and DBAs keep DBAing. Simon Mudd shared his thoughts and some suggestions on managing MySQL grants.

On someGreatTechName, Piotr Jasiulewicz shows how to get data without reading it – the power of covering indexes in MySQL.

Geert Vanderkelen, Some Abstract Type, has the coolest-looking rows ever in his post, A chessboard in MySQL: make your moves.

Peter Zaitsev of the MySQL Performance Blog lays out the principles of upgrading MySQL, “ . . . a very interesting task as you can approach it with so much different ‘depth’. For some this is 15 minutes job for others it is many month projects. Why is that?”

SQL Server

On the SQL Server side, Aaron Bertrand likewise shares his experiences upgrading 2005 => 2008, describing the steps he took in his careful crossover.

Aaron also has a quick poll: what is your favorite Management Studio tip or trick?

Dan Jones has a question too: “ . . . there are two types of DBAs: those who are myopic and those who are leaders.”what kind of DBA are you?

Simon Sabin wonders aloud, Should PASS hold the conference on the East coast?

Simon also has his latest TSQL challenge – remove duplicates from a string.

If that’s not enough T-SQL for you, Adam Machanic has issued his invitation for T-SQL Tuesday #002.

Oracle

Let’s begin with nothing. Tanel Poder wishes to remind us that NULL is not zero!, with, “an example [of] how misunderstanding NULLs may cause your application to return different results than what was intended.”

Martin Widlake has been busy decoding high_value and low_value for us. He writes, “The table DBA_TAB_COLUMNS holds the LOW_VALUE and HIGH_VALUE for columns. This information is potentially very useful to us . . . What is not so helpful is that Oracle stores, and displays, the information in an internal raw format. Which is utterly unhelpful to us of course.”

Jonathan Lewis clarifies copy stats. “ . . . someone was having trouble,” he writes, “copying stats from one index to another using the import_index_stats and export_index_stats procedures from package dbms_stats modifying the contents of their ’stat table’ between the export and import. . . . Part of the problem with this approach is that you’re not really supposed to do what they were trying to do . . . ”

Guy Harrison, meanwhile, elucidates 11gR2’s IGNORE_ROW_ON_DUPKEY_INDEX hint, “ . . . [one] of the strangest new features in 11GR2 . . . Why is this so strange? Mainly because unlike almost all other hints, this hint has a semantic effect: it changes the actual behavior – not just the optimization – of the SQL.”

Walking in the footsteps on Vasco da Gama, Luis Moreno Campos proclaims himself the first Portuguese Oracle nerd to unpack an Oracle-Sun Exadata v2. Congratulations, Luis! I think.

And we close this edition of Log Buffer with Jonathan Lewis, who on first principles, “ . . . just had to start the new year with a little humour,” courtesy of Og, Sumerian DBA.

That’s all for now. If I’ve missed you favourite DB blog from the last week, please leave a comment. See you again for LB #175.

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

MySQL and PostgresSQL jobs on the Rise, Oracle job postings decline

Planet MySQL - January 8, 2010 - 2:09pm

This tweet from former MySQL AB CEO Mårten Mickos caught my eye. It shows a trend of increased demand for MySQL and PostgresSQL expertise while job postings on job websites for those with Oracle and Ingres expertise declined.

I was a little shocked by the graph until you realize it’s just a trend. I then looked at the absolute number of jobs requesting database expertise and the story quickly becomes clear….

So if you are looking for a DB admin job the overall number of jobs in that sector are overwhelmingly Oracle jobs but the growth is in MySQL and PostgresSQL expertise.

Technorati Tags: Add new tag, Database, Ingres, Mårten Mickos, MySQL, Oracle, Oracle Database, PostgreSQL

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

451 CAOS Links 2010.01.08

Planet MySQL - January 8, 2010 - 12:24pm

Google unveils the Nexus One. RMS explains his position on dual licensing. And more.

Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca
“Tracking the open source news wires, so you don’t have to.”

For the latest on Oracle’s acquisition of MySQL via Sun, see Everything you always wanted to know about MySQL but were afraid to ask

# Google launched the Nexus One Android phone.

# Richard Stallman explained his position on selling exceptions to the GNU GPL.

# Novell’s chief technology and strategy officer for open source, Nat Friedman, left the company.

# Dirk Riehle made “The Economic Case for Open Source Foundations”.

# InformationWeek published an interview with Red Hat CEO, Jim Whitehurst, on the recession, virtualization and Steve Ballmer.

# Lenovo introduced the Lenovo Skylight, a Linux- and ARM-based smartbook device and showcased the IdeaPad U1 hybrid notebook, which is both a Windows notebook and a Linux tablet.

# The Palm webOS developer program is now open. Membership fee waived for developers of open source apps.

# Dave Rosenberg speculated on why we have not seen more open source acquisitions.

# MSI and Novell announced the upcoming availability of SUSE Moblin preloaded on the MSI U135 netbook.

# Lucid Imagination released its LucidWorks Certified Distribution for Solr 1.4.

# Acquia announced that it grew to 400 paying customers in 2009.

# Jaspersoft’s CEO, Brian Gentile, targetted 50% growth in 2010, following 60% in 2009.

# Andy Updegrove reviewed the CodePlex Foundation’s progress, while Sam Ramji reflected on its first 120 days.

# Wipro joined the Open Handset Alliance.

# WaveMaker claimed to have doubled annual revenues and achieved profitability in 2009.

# Likewise predicted at least 100% sales growth in 2010.

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

InfiniDB vs. A Leading Row-Based Database

Planet MySQL - January 8, 2010 - 8:31am

If you’ve wondered whether a MySQL-based column database can hold its own against one of the big, leading row-based databases in a benchmark with terabytes of data and serious queries, well, now you at least have some information to consider. We recently commissioned one of the top data warehouse gurus out there – Bert Scalzo – to test InfiniDB against a leading row-based database to see how well we’d do. Bert’s an expert not only in data warehousing but also in Read More...
PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Don’t fear the reaper. Why FOSS should not fear M&A by proprietary vendors

Planet MySQL - January 8, 2010 - 5:38am

A couple of posts have been published recently worrying about the impact of more open source specialist vendors being acquired by proprietary vendors.

This is an issue that crops up occasionally. Usually when a major acquisition has been announced, and the current questioning seems to be driven by the ongoing saga of Oracle-Sun-MySQL, as well as the rumoured purchase of Zimbra by VMware.

While fear of the unknown is understandable, to my mind the concern about open source specialists being acquired by proprietary vendors is driven by parochialism and misplaced assumptions about the rate of acquisitions and the acquiring company’s intentions.

For a start the statistics suggest that acquisitions involving open source vendors have declined in recent years (contrary to our expectations to be honest). According to our preliminary figures there were 24 M&A deals involving open source vendors in 2009, compared to 29 in 2008 and 35 in 2007. Dave Rosenberg makes the case that we have seen less open source M&A than we might have expected.

There is always the fear, however, that a proprietary vendor could acquire an open source rival in order to shut it down. This is a theory we at The 451 Group investigated last year via a TechDealMaker service report asking “Could an open source project survive a hostile acquisition?” (451 clients can access the report here).

Looking at the history of M&A involving open source vendors we were unable to identify a single example of a proprietary vendor acquiring an open source project in order to kill it off.

Another significant fear involving open source acquisitions is that the acquiring company will suddenly change the licensing and/or pricing in order to generate revenue from users open source of the open source project.

To me this is a fear based on a false assumption that the only way to monetize open source is directly. If we look at the strategies used by proprietary vendors to generate revenue from open source (as we did oin our Market Insight Service report “How third parties generate revenue from open source“, which was itself adapted from our Open Source is Not a Business Model CAOS report) we find that they are more likely to do so indirectly via complementary products and services.

In contrast open source specialist vendors have no choice but to attempt to monetize the open source software directly, either through support or proprietary licensed add-ons, and we have observed that this creates an inherent tension.

There is also a false assumption that open source specialist vendors are more committed to an open source “philosophy”. Some are, to be sure, but some simply see open source as a means to an end - treating it as a license tactic that disrupts competitors and expends potential adoption. There is nothing inherently wrong with that, but it does mean that for a great many open source “projects” the idea of the development community is a myth.

As previously discussed, Matt Asay noted last year that “vendors that have proprietary selling points elsewhere don’t need to control open-source code.”

In fact, I would suggest that vendors with proprietary selling points elsewhere have more to gain from releasing control of an open source project. Dirk Reihle explained the financial benefits this week with his Economic Case for Open Source Foundations, including sharing development expenses, increasing profits per sale, increases sales, and expanding the addressable market.

The fact that proprietary vendors have proprietary selling points elsewhere means that they are also in a better financial position to trade control for community via a foundational approach, in contrast to open source specialists.

There may well be situations where the acquisition of open source specialists by proprietary vendors might give cause for concern, but I believe it is wrong to assume that the impact will be negative. While many open source specialists might have something to fear regarding increased M&A activity, in the broader context open source software has more potentially to gain from the increased involvement of proprietary vendors than it has to lose.

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Going chemical: SQLAlchemy and MySQL Connector/Python!

Planet MySQL - January 8, 2010 - 5:30am

Last week I took SQLAlchemy v0.6 out of its trunk and tested it again with our MySQL Connector/Python. And surprise! SQLAlchemy comes with a 'dialect' supporting it! Nice!

However, we're still a long way on making it work. I found some additional problems which need to be fixed first.

shell> nosetests --dburi=mysql+mysqlconnector://root:@localhost/sqlalchemy \
test/dialect/test_mysql.py
..
Ran 32 tests in 4.507s

FAILED (errors=6, failures=1)

Oh, there were more errors and failures and all that due to bugs in MySQL Connector/Python. There were already some modification to the dialect (e.g. name change), and there might be some more.

If we got this sorted, Turbogears should work too!

PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Star Schema Bechmark: InfoBright, InfiniDB and LucidDB

Planet MySQL - January 8, 2010 - 12:51am

In my previous rounds with DataWarehouse oriented engines I used single table without joins, and with small (as for DW) datasize (see http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/, http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/, http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/). Addressing these issues, I took Star Schema Benchmark, which is TPC-H modification, and tried run queries against InfoBright, InfiniDB, LucidDB and MonetDB. I did not get results for MonetDB, will explain later why. Again primary goal for test was not to get just numbers, but understand specifics of each engine and their ability to handle amount of data and execute queries.

All details I have are available on our Wiki http://www.percona.com/docs/wiki/benchmark:ssb:start and the specification of benchmarks you can get there http://www.percona.com/docs/wiki/_media/benchmark:ssb:starschemab.pdf.

I generated data with scale factor = 1000, which gave me 610GB of data in raw format and loaded into each engine.

There difference in engines gets into play. While InfoBright and InfiniDB does not need indexes at all (you actually can't create indexes here), they needed for LucidDB and MonetDB, and it changes load time and datasize after load significantly. The numbers
I put in results do not include indexing time, but it also should be considered.

And indexes are exactly reason why I could not get results for MonetDB, there I faced issue
I was not prepared for. MonetDB requires that index fits into memory during indexing procedure, and for 610GB the index may get to 120GB size, and I did not have that amount of memory ( the box is only 16GB of RAM). MonetDB experts recommended me to extended
swap partition to 128GB, but my partitions layout was not really prepared for, I just did not expect I need big swap partition.

Loading
So load time.
InfiniDB can really utilize all available cores/cpus in systems ( I run benchmark on 8 cores box), and it allowed to load data faster than other engines. Though LucidDB and MonetDB
are also have multi-thread loaders, only InfoBright ICE used single core.

InfiniDB: 24 010 sec
MonetDB: 42 608 sec (without indexes)
InfoBright: 51 779 sec
LucidDB: 140 736 sec (without indexes)

I should note that time to create indexes in LucidDB was also significant and exceeded loading time. Full report on indexes are available here http://www.percona.com/docs/wiki/benchmark:ssb:luciddb:start

Data size

Size after load is also interesting factor. InfoBright is traditionally good with compression,
though compression rate is less than in case with AirTraffic table. I was told this is because
lineorder table comes not in sorted order, which one would expect in real life. Actually
the same complain I heard from InfiniDB experts - if put lineorder data in sorted order, loading
time can decrease significantly.

Datasize after load:
InfoBright: 112G
LucidDB: 120GB (without indexes)
InfiniDB: 626GB
MonetDB: 650GB (without indexes)

Queries time

Now on queries time.
Full results you can find on page http://www.percona.com/docs/wiki/benchmark:ssb:start,
and graph is below. There couple comments from me.

InfoBright was fully 1 CPU bound during all queries. I think the problem
that engine can use only single cpu/core is getting significant limitation
for them. For query 3.1 I got the surprising result, after 36h of work I got
error that query can't be resolved by InfoBright optimizer and I need
to enable MySQL optimizer.

InfiniDB is otherwise was IO-bound, and processed data fully utilizing
sequential reads and reading data with speed 120MB/s. I think it allowed
InfiniDB to get the best time in the most queries.

LucidDB on this stage is also can utilize only singe thread with results sometime better,
sometime worse than InfoBright.

Results:

Query InfoBright InfiniDB LucidDB Q1.1 48 min 21.67 sec (2901.67 sec) 24 min 26.05 sec (1466.05 sec) 3503.792 sec Q1.2 44 min 55.37 sec (2695.37 sec) 24 min 25.83 sec (1465.83 sec) 2889.903 sec Q1.3 45 min 53.49 sec (2753.49 sec) 24 min 27.25 sec (1467.25 sec) 2763.464 sec Q2.1 1 hour 54 min 27.74 sec (6867.74) 19 min 44.35 sec (1184.35 sec) 9694.534 sec Q2.2 1 hour 13 min 33.15 sec (4413.15) 19 min 49.56 sec (1189.56 sec) 9399.965 sec Q2.3 1 hour 8 min 23.41 sec (4103.41) 19 min 52.27 sec (1192.25 sec) 8875.349 sec Q3.1 NA 19 min 11.23 sec (1151.23 sec) 16376.93 sec Q3.2 3 hours 30 min 17.64 sec (12617.64 sec) 19 min 28.55 sec (1168.55 sec) 5560.977 sec Q3.3 2 hours 58 min 18.87 sec (10698.87 sec) 19 min 58.29 sec (1198.29 sec) 2517.621 sec Q3.4 1 hour 41 min 41.29 sec (6101.29 sec) 12 min 57.96 sec (777.96 sec) 686.202 sec Q4.1 8 hours 53 min 52.55 sec (32032.55 sec) 32 min 57.49 sec (1977.49 sec ) 19843.213 sec Q4.2 5 hours 38 min 7.60 sec / 5 hours 36 min 35.69 sec (20195.69 sec) 33 min 35.45 sec (2015.45 sec) 15292.648 sec Q4.3 12 hours 58 min 4.27 sec (46684.27 sec) 33 min 47.32 sec (2027.32 sec) 7241.791 sec

Graph with results (time in sec, less time is better)

Conclusions

InfiniDB is doing just great using available CPU cores full IO bandwidth reading from disk. You can see more details on InfiniDB scalability on InfiniDB's blog http://infinidb.org/infinidb-blog/mysql-parallel-query-processing-of-ssb-queries-via-infinidb-.html
SSB benchmark may be not good for InfoBright, the synthetic nature of benchmark
does not allow InfoBright to show better results. But I hope InfoBright will be able to reuse multi-cores / multi-disks soon.
I'd like MonetDB is able to use disk to build indexes, not only rely on available memory
Taking complains on SSB I am looking to get another more realistic dataset and
compare bigger set of available DW solutions

Entry posted by Vadim | 11 comments

Add to: | | | |

Categories: MySQL

A Quick Look Back: Academic Relations 2009

Yodel Anecdotal - January 7, 2010 - 9:35pm

Over the years, Yahoo! Labs has partnered closely with many universities and colleges. As a company, we want to invent the sciences needed for the next-generation Internet. Working with the faculty, researchers, and students of the world’s great universities is one of the best ways to meet that goal.

Looking back, 2009 was our most successful year to date. Through a variety of programs, we brought together a lot of smart folks and important ideas. As we head into 2010, we want to share our excitement about these programs and events.

Hack U:
Our University Hack Day competitions (Hack U) brought Yahoo!’s open technology, top developers, and hack spirit to campuses for technical talks and a 24-hour student programming competition. Students from 11 universities in the U.S., Canada, and India built more than 300 hacks. Winners from each campus participated in the New York Open Hack Day, ultimately winning 6 of the 11 categories.

Yahoo! Big Thinkers Series:
As part of our Big Thinkers distinguished speaker series, Yahoo! brought the brightest minds in the academic community to our campus for talks, and we’ve made those lectures available on the Web for all to see and enjoy. We had talks on

Global climate change with Kenneth Arrow, Nobel Laureate and Professor Emeritus of Economics at Stanford University
The psychology of the Web with Sandy Pentland, MIT
Getting around the “you don’t know what you don’t know” syndrome with Ion Stoica, University of California, Berkeley
How computational thinking can transform education with Jeannette Wing of the National Science Foundation
Calculating the largest prime number ever found with Ron Graham, University of California, San Diego
How the Internet and the science of computing are on the verge of driving a revolution in medical and biological science with Leroy Hood, Institute of Systems Biology

Next year’s Big Thinkers series should be just as captivating — we’ve got a great lineup for 2010.

Webscope:
Our Webscope™ program offers a reference library of 24 interesting and scientifically useful datasets we’ve made available for noncommercial use by academics and other scientists. More than 880 academic researchers have used the datasets, resulting in 28 technical papers, journal articles and theses so far. Very few companies have the resources and global scale to help academics and students interact with the types of real-world datasets it takes to spark innovation, but Yahoo! is one of them and it’s really paid off.

Key Scientific Challenges (KSC):
Our Key Scientific Challenges program partnered with 21 graduate students in 2009. We gave them scholarships, plus the opportunity to work closely with Yahoo!’s scientists on solving some of the biggest challenges the Web offers. The winners attended the KSC Graduate Student Summit, where they presented the fruits of their labor to fellow students and other Yahoo! researchers. It was the first of what will surely be many peer-reviewed conferences for these talented folks.

Cloud Computing Research:
Our M45 Cloud Computing Cluster is a 4,000-processor testbed being used in academia for the advancement of cloud computing research and education. Faculty members at Carnegie Mellon University have written 40 technical publications based on research performed on M45, and the cluster is now being used by three additional universities. Yahoo! is also part of both the Open Cirrus testbed and the Open Cloud Consortium.

Global Impact – Yahoo! Days in Haifa:
In November we held two very successful “Yahoo! Days” at major Israeli universities – Tel Aviv University and Technion, the Israel Institute of Technology. Yahoo! scientists from around the world joined the event, which featured a keynote by Prabhakar Raghavan, the head of Yahoo! Labs. Pictures from the Tel Aviv and Technion events are available on Flickr.

Netflix Prize-winner Yehuda Koren’s Campus Tour:
Yehuda Koren, a research scientist with Yahoo! Labs in Israel, was part of a team that spanned countries, time zones, and companies, collaborating over a three-year period to win the Netflix Prize, one of the most well-publicized and interesting machine learning contests ever conceived. It’s a story with surprises, twists and turns, game-playing, late nights, and computational brute force. There’s also deep science behind it all — science that will drive future innovation on the Web. Naturally, Yahoo! thought it was the kind of story that students and faculty at some of the world’s best universities would like to hear in person. Yehuda has visited seven universities since the award was announced in September 2009, including MIT, Harvard, Carnegie Mellon, Columbia, Rutgers, Princeton, Penn, and GA Tech. And he’ll be back at in 2010.

Purple Footprints:
And, of course, we continued to sponsor our campus seminar series, leaving “purple footprints” at Carnegie Mellon University, MIT, and University of Illinois at Urbana-Champaign. All of those seminars have been recorded and are available online. Yahoos also presented more than 100 lectures, seminars, workshops and training sessions at campuses worldwide.

If you want to learn more about Yahoo! Academic Relations programs and how you can participate in 2010, please visit us.

Ron Brachman
Vice President
Yahoo! Labs

Categories: Yahoo

sar-sql Has A Wiki

Planet MySQL - January 7, 2010 - 6:13pm

Finally settled for a wiki for sar-sql using Ubuntu's own wiki. Right now it only has my regular installation procedure. I will enhance and keep adding items as my time allows. I hope it will help to shape the future of the script.

Enjoy it with responsibility.

PS: I use the Kubuntu format because it is my desktop of choice.
PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

Fulfillment Issues -- getting back on the right foot

Planet MySQL - January 7, 2010 - 5:41pm

MySQL Certification had a very busy 2009 with a great deal of change. Not all of the changes happened gracefully and now I am correcting problems with the fulfillment system. Regular readers will remember that MySQL Certification was integrated into the systems used by Sun Learning in mid 2009. Where things got shaky was in the processing of DBA 5.1 certificates.

Previously Ricky Ho and I oversaw data entry, database updates, and fulfillment. The deeply missed Ricky did the heavy lifting in the printing and mailing of certificates. The test data from the multiple choice exams was flowing properly into the new (for MySQL) system but not for the hands-on DBA 5.1 exam. I have been going through the certification records and manually entering the data for the 5.1 DBA certificate holders. It will take a week or so to get things caught up. Meanwhile, if you are still waiting for a certificate for your MySQL 5.1 certification, please let me know at [email protected].

MySQl Certification exams will be offered at the 2010 O'Reilly MySQL Conference. More details will be posted here as they become available.
PlanetMySQL Voting: Vote UP / Vote DOWN

Categories: MySQL

About Me

Jay Janssen
Yahoo!, Inc.
jayj at yahoo dash inc dot com

MySQL
High Availability
Global Load Balancing
Failover