mysqlguy.net

Feed aggregator

Virtual squares - Taking virtualization to new limits

Planet MySQL - May 27, 2008 - 8:30am

During the Italian Free Software Conference in Trento, I attended an amazing presentation on virtual components.
Renzo Davoli, professor at Bologna University and hacker, has stretched the concept of virtualization to new limits. Virtual Square is a project aiming at developing more dynamic implementations of virtual entities, which eventually get separated from the concept of operating system and root privileges.

The coolest aspect of all this project is the virtualization of single elements like a disk drive, a net port, a file system, without root privileges, and with no impact on other users.
Virtualizing single elements makes life easier for demanding users, and more quite for their neighbors, who won't be affected by massive reduction of overall resources as it happen with normal virtualization of operating systems.
Think of the applications: for example, it would be easier to establish dedicated quotas for database and web server users, with better security and easier maintenance and without creating new OS users. MySQL, with its limited interface for user resources, would surely benefit from this system. There is a lot of potential in this idea. Let's hope it is pushed a bit farther than the academic circles.
Categories: MySQL

Please give us your feedback by taking the MySQL Magazine Survey!

Planet MySQL - May 27, 2008 - 5:24am

If you are working with MySQL as a DBA or developer, I'd like to encourage you to consider taking the MySQL Magazine Survey, which was compiled by Keith Murphy and Mark Schoonover.

The survey takes around 10-15 minutes to complete and runs until June 16th. The results will be published in the summer issue of MySQL Magazine, due on July 15th. The questions cover a broad range of topics, from details about your MySQL experience and job description over connectors and languages to operating systems and MySQL versions.

Thanks in advantage for your support and input! The results of this survey will be interesting for us as well.

Categories: MySQL

Slides on Q4M

Planet MySQL - May 27, 2008 - 4:27am

Today I had a chance to explain Q4M in detail, and here are the slides I used.

It covers from what (generally) a message queue is, the internals of Q4M, how it should be used as a pluggable storage engine of MySQL, to a couple of usage senarios. I hope you will enjoy reading it.

| View | Upload your own
Categories: MySQL

Students: OpenSolaris, NetBeans blogging contest

Planet MySQL - May 27, 2008 - 12:15am

As some of you would have recently noticed, I’ve started playing around (I can’t really say using, seeing that its still not a daily basis kind of thing for me yet) with OpenSolaris and NetBeans. Now, Sun is encouraging students to use, review, and blog about these two great products - check out the student reviews contests.

If you’ve not tried OpenSolaris 2008.05, you should probably give it a twirl. Sure, I’ve not installed it on bare metal hardware yet (I’ve been travelling so much, I’m not anywhere near machines), but it works fine inside VirtualBox. NetBeans is great if you’re writing Java, Ruby, PHP, and are connecting to a database - Connector/J for MySQL is shipped with it. If you were trying it out, take a look at the NetBeans+GlassFish+MySQL bundle.

So, students out there, take a look at the Student Reviews Contest. You have till June 6, 2008. There’s your chance to win anywhere between $100-$250 (USD?), which really means a lot of beer ;)

(and here’s hoping we run something similar when MySQL 5.1 is GA… we used to run bug finding contests where we used to give away iPod Nano’s, though I think Visa Debit Card’s are so much more useful than iPod’s…)

Categories: MySQL

Multisourced Production Infrastructure: History, and a stab at the Future

Planet MySQL - May 26, 2008 - 10:21pm

Startups are pretty fascinating. I work for a startup, and one of my good friends works for another startup. I’ve also worked for 2 other startups, one during the first “bubble”, and another one a few years later. Oh my, how the world of web startups has changed in that time!

1999: You must have funding

The first startup I was ever involved in was a web startup. It was an online retailer. They were starting from nothing. My friend (a former coworker from an earlier job) had saved for years to get this idea off the ground. He was able to get a few servers, some PCs for the developers he hired, and he got the cheapest office space in all of NYC (but it still managed to be a really cool space, in a way that only NYC can pull off), and he hosted every single service required to run the web site in-house. If I recall correctly, he had a web and database server on one machine, and I believe the primary DNS server was on an old desktop machine he had laying around the house. This gave him the ability to build the completely, 100%-functional prototype, and use it to shop for funding.

It worked. They got funding, they bought more and bigger servers. They got UPSes for them (yay!), they got more cooling, a nicer office, and they launched the site, pretty much exactly as the prototype existed, and things were pretty stable. Unfortunately, the VCs who took seats on the board after the first round of financing didn’t understand the notion of “The Long Tail”, so they eventually went under, but that’s not the point.

The point is, that was 8 or 9 years ago. It costed him quite a good bit of his hard-earned savings just to get to a place where he could build a prototype. A prototype! He only really knew Microsoft products, and buying licenses for Microsoft SQL Server, and the developer’s tools (I forgot what they were using as an IDE, but they were a ColdFusion shop) was quite a chunk of money. My friend really only had enough money to put together a prototype, and they were playing “beat the clock” — trying to get a prototype done, and shop for (and get) funding, before the money ran out, because they couldn’t afford the hardware, power, cooling, big-boy internet connection, and the rest of what goes into a production infrastructure. The Prototype->VC->Production methodology was pretty typical at the time.

2003: Generate Some Revenue

In 2003, a couple of years after the bubble burst, I was involved in another startup. This one was 100% self funded, but has been rather successful since. By this time, dedicated hosting was just affordable enough that it was doable for a startup that had some revenue being generated, and that’s what my friend did. He also outsourced DNS completely (through his registrar, if memory serves), but he still hosted his own email, backup, and some other services in-house. He had plenty of hiccups and outages in the first year, but overall it ran pretty well considering all of the things he *didn’t* have to be concerned with, like power, cooling, internet uplinks, cable trays, etc. The world was becoming a friendlier place for startups.

2008: Do it all, on the cheap

Nowadays, the world is a completely different place for startups, and a lot of this is due to the rich set of free (or very cheap) resources available to startups that make it possible for them to do a production launch without the VC funding that used to be required just to get the hardware purchased.

In 2008 you can outsource DNS for relatively little money, and it’ll have the benefit of being globally distributed and redundant beyond what you’re likely to build yourself. You can get Google Apps to host your email and share calendars and documents. You can store backups on Amazon’s S3. You can use something like Eclipse, Komodo Edit, or a number of language-specific environments like Wing IDE or Zend Studio to do “real development” (whatever your definition of that is) for free or relatively cheap. You can also get a free database that is reasonably well-stocked with features and capabilities, a free web server that runs 65%+ of the internet sites in existence, and if you have the know-how (or can get it), you can actually host anything you want, including your entire production infrastructure (within reason, and subject to some caveats) on Amazon’s EC2, for a cost which is tied to what you use, which is cheaper in a lot of cases than either buying or leasing a dedicated server. Multisourcing has arrived!

In looking at this progression from “you must have funding”, to “you’re going to need to generate a little revenue”, to “do it all, on the cheap”, the really obvious question this all raises is:

“Now what?”

Well, this whole 2008 situation is making things better, but… how do I put this… “It’s not soup yet”.

First of all, there is no single platform where you can realistically do everything. Google’s AppEngine is nice, but it has its limitations, for example, you don’t have any control over the web servers that run it, so you can’t, say, add an Apache mod_rewrite rule, or use a 301 redirect, or process your log files, etc. Troubleshooting this application based solely on input from people who are having issues with your app would be difficult.

Amazon’s service gives you far more control, and if you need it, that’s great, but it completely changes how you architect a solution. I think that some of these things are good changes, and are things we should all be thinking about anyway, but Amazon forces you to make decisions about how to plan for failure from the moment you decide to go this route — even if it’s for prototyping, because until persistent storage on EC2 is a reality available to the entire user base, whenever an EC2 instance disappears, so does every bit of the data you added to it. You’ll have to start from scratch when you bring up another instance. You’re also going to have to add more scripts and utilities to your toolbelt to manage the instances. What happen when one disappears? How do you fail over to another instance? How can you migrate an IP address to the running instance from the failed one? How you do all of these things, in addition to just building and installing systems, is different, and that means learning curve, and that means overhead in the form of time (and expense, since you have to pay for the Amazon services to use them to learn on).

There are also now “grid” solutions that virtualize and abstract all of your infrastructure, but give you familiar interfaces through which to manage them. One that I’ve used with some success is AppLogic, but other services like GoGrid and MediaTemple have offerings that emphasize different aspects of this niche “Infrastructure-as-a-service” market. Choose very carefully, and really think about what you’ll want to do with your infrastructure, how you want to manage it, monitor it, in addition to how you’ll deliver your application, and also think about how you’ll be flexible within the confines of a grid solution before you commit, because the gotchas can be kind of big and hairy.

None of these are whole solutions. However, any of them could, potentially, some day, become what we would now call a “perfect solution”. But it still wouldn’t be perfect in the eyes of the people who are building and deploying applications that are having to scale into realms known seemingly only inside some brain vault that says “Google” on it. What those of us outside of that vault would like is not only Google-like scalability, but:

  • global distribution, without having to pledge our souls in exchange for Akamai services. It’s great that I can build an infrastructure on EC2 or GoGrid, but I’d like to deploy it to 10 different geographic locations, but still control it centrally.
  • the ability to tightly integrate things like caching distribution network services with the rest of our infrastructure (because CDNs are great at serving, but not so much at metrics)
  • SAN-like (not NFS-like) access to all storage from any/all systems, without sacrificing the IO performance needed to scale a database properly.
  • As an admin, I want access to all logs from all services I outsource, no matter who hosts it. I don’t believe I can access, for example, our Google Apps logs, but maybe I’ve forgotten to click a tab somewhere.
  • A *RELATIONAL* database that scales like BigTable or SimpleDB

There’s more to it than this, even, but I’ve stopped short to make a point that needs making. Namely, that these are hard problems. These are problems that PhD candidates in computer science departments do research on. I understand that. The database issue is one that is of particular interest to me, and which I think is one of the hardest issues (not only because of its relationship to the storage issue, by the way). Data in the cloud, for the masses, as we’ve seen, involves making quite a few rather grandiose assumptions about how your schema looks. Since that’s not realistic, the alternative is to flatten the look of the data, and take a lot of the variables out of the equation, so they don’t have to make *any* assumptions about how you’ll use/organize the data. “Let’s make it not matter”. Genius, even if it causes me pain. But I digress…

The idea here is just to give some people a clue what direction (I think) people are headed in.

These are also very low-level wants. At a much, much, much higher level, I’d like to see one main, major thing happen with all of these services:

  • Get systems administrators involved in defining how these things are done

I’m not saying that because I want everything to stay the same and think a system administrator will be my voice in that or something. I do *NOT* want things to stay the same, believe me. I’m saying it because it seems pretty obvious to me that the people putting these things together are engineers, and not systems administrators. Engineers are the people you call when you want to figure out how to make something that is currently 16GB fit into 32MB of RAM. They are not the people you call when you want to provide a service/interface/grid/offering/whatever that allows systems folks to build what amounts to a company’s IT infrastructure on a grid/instance/App/whatever.

Here’s a couple of examples:

When I first launched an AppLogic grid, a couple of things jumped out at me. The partitions on the components I launched were 90% full upon first boot, they had no swap partition, and there was no consistency between OS builds, so you can’t assume that a fix on one machine can be blown out via dsh or clusterssh to the rest. The components were designed to be as small as possible, so as to use as little of the user’s alotted resources as possible. In addition, mount points created in the GUI management interface and then mapped to a component… don’t cause the component to have any clue what you just did, which raises the question “umm… why did I bother using the GUI to map this thing to this component if I just have to edit /etc/fstab and mount it in the running instance myself anyway? Back to consistency, this is unlike if you had, say, allocated more RAM or storage, or defined a new network interface on the device in the GUI.

There is no part of EC2 or S3 that looks like a sysadmin was involved in that. It’s a programmer’s platform, from what I can tell. For programmers, by programmers. Luckily, I have enough background in programming that I kind of “get it”, but while I might be able to convince myself that there are parallels between how I approach programming and building infrastructures, it still represents a non-trivial context switch for me to move from working deeply at one to working deeply at the other, so mixing the two as a necessity for moving forward is less than nice.

There is no “database in the cloud” service that looks remotely like there was a database systems person involved at all, that I can tell. I’ll confess to not having used BigTable or SimpleDB, but the reason is because I can’t figure out how to make it useful to me at the moment. These tools are not relational, and my data, though it’s been somewhat denormalized for scaling purposes (compromises taken carefully and begrudgingly - I’d rather change database products, but it’s not in the cards), is nonetheless relational. I’ve considered all kinds of object schemes for data in the past, and I still think that there’s some data for which that can work well, but it’s not currently a solution for me. Once you look at the overhead in managing something like EC2, S3, AppLogic, etc., the very last thing you need is the overhead of a changing data storage/manipulation paradigm.

Should I be hiring systems folks, or developers? Both? Ugh. Just when I thought you could finally get away with building a startup with nothing more than an idea, a sysadmin and a coder, here they go roping me back into hiring a team of developers… to manage the systems… and the data. No good (and I mean *NO GOOD*) can come of developers managing data. I know, I’ve seen ‘em do it.

All of that said, I use all of this stuff. Multisourcing is here to stay - at least until someone figures a whole bunch of stuff out to make unisourcing a viable alternative for systems folks, or they collectively redefine what a “systems person” is, which is an extremely real possibility, but is probably quite a ways off. My $.02. Flame at will

addthis_url = 'http%3A%2F%2Fwww.protocolostomy.com%2F2008%2F05%2F26%2Fmultisourced-production-infrastructure-history-and-a-stab-at-the-future%2F'; addthis_title = 'Multisourced+Production+Infrastructure%3A+History%2C+and+a+stab+at+the+Future'; addthis_pub = 'jonesy';
Categories: MySQL

Can MySQL temporary tables be made safe for statement-based replication?

Planet MySQL - May 26, 2008 - 8:17pm

A while ago I wrote about how to make MySQL replication reliable, part of which is to eliminate temporary tables. The idea is this: if a slave is stopped (or crashed) while a temporary table is open and is then restarted, the temporary table doesn't exist anymore, and the slave will have problems trying to replay any further statements that refer to these tables. Thus, I claimed, there's no alternative but to eliminate temporary tables. This problem may not exist for row-based replication in MySQL 5.1 and later, but most installations I know of are using statement-based replication, even on MySQL 5.1

This is a contentious topic. People love their temporary tables and will ask hopefully "are you sure this isn't safe?" They'll propose all sorts of ways to mitigate the danger, and I've heard many of them. But I recently heard an angle on this I had not heard before.

The argument is this: "you can create an InnoDB temporary table and use it only within one transaction, and then if the slave crashes and restarts, it'll roll back the transaction to the beginning." In other words, in theory if the temporary table exists only within that one transaction, and if your transaction accesses only InnoDB tables, it's safe.

My first thought was, you can't do that. CREATE TABLE commits the transaction, so there's implicitly more than one transaction. However, as the person pointed out, that isn't true with CREATE TEMPORARY TABLE. I tested this (sometimes the manual is wrong!) and found that indeed, you can open a transaction, make some changes, create a temporary table with ENGINE=InnoDB, and the InnoDB transaction ID does not change in SHOW INNODB STATUS. The statements are all within one transaction. (However, if you type ROLLBACK the temporary table doesn't get dropped. It's not really transactional -- it just doesn't auto-commit the transaction. The ROLLBACK will produce a warning that says "Some non-transactional changed tables couldn't be rolled back", which is interesting.)

But does that mean it's safe for replication?

There is one good way to find out: test it. I fired up my master-and-two-slaves replication sandbox, flushed all the logs, and set out to get to the bottom of the matter.

First, I stopped the slave threads so I could choose which statements to replay on the slave and pick the "crash point" as I wished. (I didn't shut down the slave, I just stopped the replication processes. This is safe to do even when temporary tables are open.) Then I created a temporary table on the master, inserted some rows into it, and dropped it:

PLAIN TEXT SQL:
  1. master> SET autocommit=0;
  2. master> begin;
  3. master> CREATE TEMPORARY TABLE test.t(a int) engine=innodb;
  4. master> INSERT INTO test.t(a) VALUES(1);
  5. master> DROP TEMPORARY TABLE test.t;
  6. master> commit;

In theory, that's all in one transaction. Since I flushed the logs before I did this, everything in the binary log so far comes from these statements. Let's look at the binary logs:

PLAIN TEXT SQL:
  1. master> SHOW master STATUS;
  2. +------------------+----------+--------------+------------------+
  3. | File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
  4. +------------------+----------+--------------+------------------+
  5. | mysql-bin.000007 |      474 |              |                  |
  6. +------------------+----------+--------------+------------------+
  7.  
  8. master> SHOW binlog events IN 'mysql-bin.000007'\G
  9. *************************** 1. row ***************************
  10.    Log_name: mysql-bin.000007
  11.         Pos: 4
  12.  Event_type: Format_desc
  13.   Server_id: 1
  14. End_log_pos: 98
  15.        Info: Server ver: 5.0.45-log, Binlog ver: 4
  16. *************************** 2. row ***************************
  17.    Log_name: mysql-bin.000007
  18.         Pos: 98
  19.  Event_type: Query
  20.   Server_id: 1
  21. End_log_pos: 207
  22.        Info: CREATE TEMPORARY TABLE test.t(a int) engine=innodb
  23. *************************** 3. row ***************************
  24.    Log_name: mysql-bin.000007
  25.         Pos: 207
  26.  Event_type: Query
  27.   Server_id: 1
  28. End_log_pos: 271
  29.        Info: BEGIN
  30. *************************** 4. row ***************************
  31.    Log_name: mysql-bin.000007
  32.         Pos: 271
  33.  Event_type: Query
  34.   Server_id: 1
  35. End_log_pos: 90
  36.        Info: INSERT INTO test.t(a) VALUES(1)
  37. *************************** 5. row ***************************
  38.    Log_name: mysql-bin.000007
  39.         Pos: 361
  40.  Event_type: Query
  41.   Server_id: 1
  42. End_log_pos: 176
  43.        Info: DROP TEMPORARY TABLE test.t
  44. *************************** 6. row ***************************
  45.    Log_name: mysql-bin.000007
  46.         Pos: 447
  47.  Event_type: Xid
  48.   Server_id: 1
  49. End_log_pos: 474
  50.        Info: COMMIT /* xid=39 */

Very interesting! The order of statements is not the same in the binlog as I typed into the console. If you replay the binary log, you'll get two transactions here.

This shows us something interesting that isn't considered in the "all inside one transaction" argument: transactions aren't the only thing that matters. How the server logs events to the binary log is equally important. It appears that we can break replication on the slave by killing the slave after event 98 executes and before event 207 executes. But let's not draw any conclusions yet. The only way to tell for sure is to really test it.

Since I'd stopped the slave, I could easily test my theory. Let's let the slave replay events up until position 207, kill it, and restart it:

PLAIN TEXT SQL:
  1. slave1> SHOW slave STATUS\G
  2. *************************** 1. row ***************************
  3.             Master_Log_File: mysql-bin.000006
  4.         Read_Master_Log_Pos: 98
  5.              Relay_Log_File: mysql_sandbox20551-relay-bin.000028
  6.               Relay_Log_Pos: 235
  7.       Relay_Master_Log_File: mysql-bin.000006
  8.            Slave_IO_Running: No
  9.           Slave_SQL_Running: No
  10.          ...... omitted .........
  11.  
  12. slave1> start slave until master_log_file='mysql-bin.000007', master_log_pos=207;
  13. slave1> SHOW slave STATUS\G
  14. *************************** 1. row ***************************
  15.             Master_Log_File: mysql-bin.000007
  16.          .... omitted ........
  17.         Exec_Master_Log_Pos: 207
  18.  
  19. slave1> SHOW STATUS LIKE '%temp%';
  20. +------------------------+-------+
  21. | Variable_name          | Value |
  22. +------------------------+-------+
  23. | Slave_open_temp_tables | 1     |
  24. +------------------------+-------+

The slave is now "vulnerable," in theory. To test my theory, I'll shut down and restart the slave gracefully, rather than simulating a crash with kill -9, and see what happens.

PLAIN TEXT SQL:
  1. $ ./node1/stop
  2. $ ./node1/start
  3. $ ./s1
  4. slave1> SHOW slave STATUS\G
  5. *************************** 1. row ***************************
  6.                  Last_Errno: 1146
  7.                  Last_Error: Error 'Table 'test.t' doesn't exist' on query. Default database: ''. Query: 'INSERT INTO test.t(a) VALUES(1)'

That's the error I thought I'd see. Even though it was used entirely within one transaction on the master, the temporary table was not safe for replication.

I'm pretty sure this is a bug. The temporary table shouldn't be logged out-of-order on the master like this (I suspect it's logged out-of-order because CREATE TEMPORARY TABLE can't be rolled back). But bug or no, it is what it is.

There's one more angle to the email thread that inspired this article: what if the whole transaction is inside a stored procedure? Whether this works or not depends, again, on how the stored procedure call is logged to the binary log. Let's create a stored procedure to hold the transaction, which this time will insert data from the temporary table into a non-temporary InnoDB table:

PLAIN TEXT SQL:
  1. master> delimiter //
  2. master> CREATE procedure test_temp() begin
  3.     -> start transaction;
  4.     -> CREATE TEMPORARY TABLE test.t(a int) engine=innodb;
  5.     -> INSERT INTO test.t(a) VALUES(1);
  6.     -> INSERT INTO test.ins(a) SELECT * FROM test.t;
  7.     -> DROP TEMPORARY TABLE test.t;
  8.     -> commit;
  9.     -> end//
  10. master> delimiter;

Now calling the stored procedure should put a row into the test.ins table. Let's see:

PLAIN TEXT SQL:
  1. master> call test_temp();
  2. master> SELECT * FROM test.ins;
  3. +------+
  4. | a    |
  5. +------+
  6. |    1 |
  7. +------+

Good. Let's see what's in the binary log:

PLAIN TEXT SQL:
  1. master> SHOW binlog events IN 'mysql-bin.000011'\G
  2. *************************** 1. row ***************************
  3.    Log_name: mysql-bin.000011
  4.         Pos: 4
  5.  Event_type: Format_desc
  6.   Server_id: 1
  7. End_log_pos: 98
  8.        Info: Server ver: 5.0.45-log, Binlog ver: 4
  9. *************************** 2. row ***************************
  10.    Log_name: mysql-bin.000011
  11.         Pos: 98
  12.  Event_type: Query
  13.   Server_id: 1
  14. End_log_pos: 211
  15.        Info: USE `test`; CREATE TEMPORARY TABLE test.t(a int) engine=innodb
  16. *************************** 3. row ***************************
  17.    Log_name: mysql-bin.000011
  18.         Pos: 211
  19.  Event_type: Query
  20.   Server_id: 1
  21. End_log_pos: 279
  22.        Info: USE `test`; BEGIN
  23. *************************** 4. row ***************************
  24.    Log_name: mysql-bin.000011
  25.         Pos: 279
  26.  Event_type: Query
  27.   Server_id: 1
  28. End_log_pos: 94
  29.        Info: USE `test`; INSERT INTO test.t(a) VALUES(1)
  30. *************************** 5. row ***************************
  31.    Log_name: mysql-bin.000011
  32.         Pos: 373
  33.  Event_type: Query
  34.   Server_id: 1
  35. End_log_pos: 198
  36.        Info: USE `test`; INSERT INTO test.ins SELECT * FROM test.t
  37. *************************** 6. row ***************************
  38.    Log_name: mysql-bin.000011
  39.         Pos: 477
  40.  Event_type: Query
  41.   Server_id: 1
  42. End_log_pos: 288
  43.        Info: USE `test`; DROP TEMPORARY TABLE test.t
  44. *************************** 7. row ***************************
  45.    Log_name: mysql-bin.000011
  46.         Pos: 567
  47.  Event_type: Xid
  48.   Server_id: 1
  49. End_log_pos: 594
  50.        Info: COMMIT /* xid=124 */
  51. 7 rows IN SET (0.00 sec)

What you see depends on your version of MySQL, because the logging of stored procedures has changed over time. If just the CALL statement had been logged, I think we might have been safe using a stored procedure. However, since all the statements went into the binlog individually, there's clearly an opportunity to break replication here. It looks like this doesn't avoid the problem either.

Interestingly, I also created a version of the stored procedure that doesn't begin and commit a transaction. After calling it, the CREATE TEMPORARY TABLE statement is logged into the binlog; after then typing COMMIT, the rest of the statements go into the binlog. It appears to me that there's no way to get the CREATE TEMPORARY TABLE statement to be logged inside the transaction. And when it comes to a replication slave, what's logged -- not what executes on the master -- is what's important.

In summary, I still don't see any way to use temporary tables with MySQL statement-based replication without some risk of breaking slaves. At some point I may test how it works with row-based replication; I believe even row-based logging format is going to have some problems, because the CREATE TABLE is logged in statement format. But that's a topic for another post.

Entry posted by Baron Schwartz | 8 comments

Add to: | | | |

Categories: MySQL

NDB$INFO

Planet MySQL - May 26, 2008 - 7:43pm

There’s been talk over the years of better monitoring for NDB (MySQL Cluster). This has been dubiously named NDB$INFO, after some special magical naming convention for tables holding information on the insides of NDB. Otherwise known as Worklog 3363 (viewable on MySQL Forge).

The basic idea is to get a bunch of things that are already known inside NDB available through a rather standard interface (SQL is preferred).

My top examples are “How much DataMemory is used?” and “Do I need to increase MaxNoOf(Tables|Attributes|ConcurrentTransactions)?”. You can get some of this information now either through the management client (ndb_mgm -e “all report MemoryUsage”) or the MGM API using events and some other foo.

This is a rather limited interface though. It would be great if you could point all your monitoring stuff to a MySQL Server, throwing queries at it and finding out the state of your cluster.

So this year I’ve been working on implementing NDB$INFO. The big requirements (for me at least) are:

  1. Everything can be queried easly from SQL
  2. It’s easy to add a new NDB$INFO table (for a NDB developer)
  3. you can use NDB$INFO tables to diagnose problems (such as nodes not connecting)

Among the 492 things I’m currently doing, is fixing up a basic patchset for NDB$INFO and working on getting it into the tree. It’s all going to be basic scan interfaces in the current version, so things may be slow if there’s lots of rows, but they’ll get there.

What would you like to see exposed?

Categories: MySQL

Encrypted Online Backup (design, thoughts, ask-the-lazyweb)

Planet MySQL - May 26, 2008 - 7:07pm

So after a ever so temporary but loud moment of insanity[1] having a decision made which I very strongly disagreed with (wanting to release online encrypted backup as closed source), we’re back in the world of freedom and the MySQL Server is (and will be) free and open source software (dual licensed, so you can buy a commercial license of the same thing).

[1] Addition (wanting to remove my use of the word): Marten (rightly) points out that although appreciating the new blog posts, he doesn’t appreciate having his decisions called insanity. He’s right. It’s the wrong way to put it. So, without wanting to censor or change history (instead preferring to illustrate my own stupidity and amazing ability to completely say the wrong thing every 6 months or so), I offer this clarification (that i have tried to express in about 3 drafts of blog posts, none of which have made the light of day as i was never really happy with them): the decision was made with all the right intentions (grow the company, end up producing more free software, making sales to enterprises easier, clearer differentiation etc) but it was one that I (and many others) rather strongly disagreed with. In the end, the dicision was made to have these parts as free software and I truly believe that this was made after more arguments were presented by myself (and others) about why having these parts as closed was a bad idea. It is quite the thing to make the decision to make modules for your free software product closed, it is about 15 steps higher to go back on it. I’ll share a phrase I used a few times when being a right nick-picker about things during employment contract negotiation this year (for MySQL Australia and then Sun): “Do I trust Marten? Absolutely. It’s the next guy. Remember, SCO was once Caldera and producing a linux distro and generally considered good.” So, that was more than I intended to write on the subject… but hopefully clarifies that I just thought the decision itself was bad, and am lucky enough to work at a place that encourages discussion when you don’t like things.

So, now I’m involved with writing up the worklog for encryption for the MySQL server native online backup. I also wrote most of the original worklog for compression of online backup (I implemented compressed backup and LCP for MySQl Cluster) as well as some proof-of-concept code (written in <5 minutes at 3am while jetlagged).

There are two main approaches to encryption: symmetric and asymmetric (public key). I think we should support both (but we’ll see what others think).

For symmetric (password based for those not up with the street lingo of crypto) we’re thinking of the following algorithms: 3DES, AES, Blowfish. Are there any others that people care about?

DES is obviously out as it’s not considered secure, and really, we should be helping users to get things right.

For public key: RSA and DSA are the obvious choices.

As for libraries implementing all of these? well….. I’m thinking about libgcrypt - it looks fairly nice and a bit similar to the kernel crypto api (which seems quite nice). Anybody got any other suggestions? Things you’d like to see? thoughts?

EDIT: Server not Service. We sell services, the server is free and open source. I fail.

Categories: MySQL

Presenting a Webinar on Memcached Use Cases

Planet MySQL - May 26, 2008 - 6:47pm
Quick link: register for Designing and Implementing Scalable Applications with Memcached and MySQL webinar (June 29)
Ever since its introduction, memcached has been changing the way cost-efficient caching is perceived. Some passionately love it, others cynically hate it.

Today, many large scale web 2.0 properties (including my employer) save millions of dollars by depending on memcached to bring their application response time under control and to offload pressure from databases.

There are several success stories about using memcached to speed up database driven websites. Facebook, for instance, runs the largest memcached installation and the numbers only keep increasing. In May 2007, Facebook was reportedly running 200 dedicated servers with 3TB of memory in their memcached cluster. At the "Scaling MySQL Up or Out" Keynote, Facebook revealed they are now using 805 dedicated memcached servers. That's more than a 400% increase in less than a year!

Twitter, digg, Wikipedia, SourceForge, and even Slashdot depend on memcached to keep their users happy.

For my employer, memcached has been a crucial component of the infrastructure that has been instrumental in handling explosive growth in a cost-efficient manner. In addition, memcached has helped us offload billions of queries from our database.

To highlight several real-life use cases of memcached (see below), I will be presenting a memcached webinar on Thursday, June 29 at 1 PM EST (10 AM PST). Monty Taylor (Senior Consultant, Sun Microsystems) and Jimmy Guerrero (Sr Product Marketing Manager, Sun Microsystems - Database Group) will also be speaking at the event. Space is limited and filling up fast (200+ registrants already) so I highly recommend registering now.

In this webinar, I will be covering several use cases for memcached including (but not limited to):
  • deterministic cache
  • non-deterministic cache
  • proactive cache
  • "state" cache
  • filesystem cache replacement
Hope to "see" you at the webinar.

Note: This memcached webinar is not to be confused with the memcached webinar being presented by Ivan Zoratti on June 28.
Categories: MySQL

eHorizons

Planet MySQL - May 26, 2008 - 6:18pm

I flew back into Sydney on Sunday morning to give a tutorial at Sun’s Expanding Horizon’s summit. It was a half day tutorial on MySQL Cluster - so a shortened version of the one I’ve given at the MySQL User Conference for the past few years. I had about 15 attendees, all of which had done their homework (It probably help that they were pestered via phone :)

The tutorial went really well. It really helps when everybody has done the homework and already have Linux and MySQL Cluster installed. Everybody got up and running (we used mysql-test-run to start a cluster, not writing the config file from scratch, which made things happen a lot faster). Also got some good feedback - yay! We may even have some people look to deploy it after attending, always a plus.

I also gave a “Scaling MySQL” talk that was well attended. I didn’t talk at all about query optimisation, mysqld configuration tuning or stuff like that - instead focusing on making the app saner, caching etc. memcached, of course, got a good mention :) It seemed to go down well, some good questions, and a rather full room.

So a rather productive two days for spreading the freedom love.

However, the conference dinner was complete FAIL on account of the venue. I don’t know which vegetarians/vegans call beef and fish vegetarian, but I’ve never met one (hint: they don’t exist). This is *after* the explanation on being vegan. Then… there was some discussion about pasta with a tomato/vegetable sauce, never came. So as others were finishing meals, again inquire - eventually, something was brought over. Undercooked rice and undercooked steamed vegetables. I don’t know who eats that for dinner (hint: nobody). Of course, after the pasta discussion, I then selected a wine that would go with it. After more of the stuff-ups, I pointed out that there was no way I was going to pay for the wine when shit like that was served (yes, in those words… perhaps I’ve been watching too much Gordon Ramsay).

It was the first time ever that I’ve left a restaurant during a function, gone down the street, gotten take away and brought it back. Novotel Brighton Beach (in Sydney) - you suck.

(there’s also a beutiful view across the bay of the runways of Sydney airport… which is fine if you can sleep through planes landing an taking off, like i can, but i know others can’t).

Will never stay at the Novotel Brighton Beach voluntarily, ever. On the plus side, the guy at the desk when checking out was very apologetic…

Categories: MySQL

MySQL University - Lua advanced scripting

Planet MySQL - May 26, 2008 - 5:15pm

MySQL University is a set of public lessons held by experienced MySQL developers, to spread technical knowledge inside and outside the company.

It's like a conference session, except for the physical presence. There is no traveling involved. No hotel rooms to book and restaurant bills to pay. The lessons are given over the internet, using an audio stream for the exposition, and IRC for questions from the audience. If you want to attend, check the instructions for attendees and mark your calendars:

Thursday, May 29th, at 15:00 CEST (= 6:00am PDT, 8:00am CDT, 9:00am EDT).

The session will cover Advanced Lua Scripting, i.e. how to deal with MySQL Proxy scripts with more than basic action.

Among the topics covered:

  • handling multiple queries
  • dynamically loading of modules at run time
  • cookbook recipes for result sets
Categories: MySQL

MySQL Workbench 5.0.22 Coming Up - Switch to Bazaar

Planet MySQL - May 26, 2008 - 3:37pm

The next bug-fix release of MySQL Workbench is coming up this week. Tax is still working on the build, we had some unit tests failing that are now passing again. Apart from the usual bug fixes this will be the first release created out of our new Bazaar repositories, http://bazaar-vcs.org/

The team has been relying on the SubVersion source version control system which worked great for us during the last years. But as we did the branching for MySQL Workbench 5.1 we decided that it is time to move to a more advanced system. We have been looking at Mercurial http://www.selenic.com/mercurial/wiki/ but decided to go with Bazaar instead which performed excellent in our tests.

We hope to finish the WB 5.0.22 release by tomorrow and then it will take 24h for the mirrors to catch up.

Categories: MySQL

MySQL Survey Online

Planet MySQL - May 26, 2008 - 2:59pm

With the extensive help of Mark Schoonover I have a survey online for those who use MySQL in their work. You might be a DBA or you might be a developer. If you use MySQL in your job I want your input!! I want as many people as possible to take the survey to get as accurate a picture of MySQL usage as I can. It will take less than ten minutes to take and this will help everyone to have a better idea of what is going on in the MySQL world.

I would encourage anyone to link to the survey to generate as much traffic as possible. I will have it online for three weeks, closing out the survey on June the 16th at midnight EST. The results of the survey will be published in the summer issue of MySQL magazine which will be available July the 15th.

The survey is hosted here: MySQL Magazine Survey

Thanks for your survey input!

Categories: MySQL

Performance of status commands

Planet MySQL - May 26, 2008 - 2:45pm

Though some may think status command performance is overrated, our code heavily relies on SHOW STATUS LIKE "Thread%", and I like to spam SHOW INNODB STATUS. Add monitoring systems, which rely on ‘SHOW GLOBAL STATUS’ - and performance starts to matter.

And things are a bit abandoned. See, in performance engineering people always look for low hanging fruit to pick, and there’re quite a few:

  • Peter wrote a year ago about performance regression in ‘SHOW STATUS’. Of course, now that Mark has filed a bug - patch is already in there. This bug would byte us very very severely, if we’d run live site against unpatched 5.0 - as it slows down from 0.2ms to 20ms. There’s a small ha-ha in this issue - guess where these two lines come from. ;-)
  • SHOW INNODB STATUS was taking 0.3s just because there were two rarely-looked at variables that needed traversing quite some structures. I filed bugs #36941 and #36942, to remind developers of this problem. Disabling these two counters allowed to run few thousand ‘SHOW INNODB STATUS’ a second, instead of just three.
  • SHOW GLOBAL STATUS traverses statistics for each thread. Doesn’t matter on single-thread development box too much, but is very very unhappy at higher thread concurrency - walking every variable on every thread is quite a task even for modern hardware. Generally some SMP scaling has been added by reducing locking on global variables this way - at the cost of making the command unusable.
Categories: MySQL

INSERT speed, paramy, auto-inc

Planet MySQL - May 26, 2008 - 2:17pm

One of first things I noticed, when I tried to run some 5.0 tests, was that INSERT speed was much slower (no benchmarking at that time, purely subjective speed that loading the dump was somewhat too slow).

After discussing at evening, and some coding in the morning I built paramy - a simple C++ utility, that reads .sql stream, and spreads execution of statements over multiple worker threads.

For benchmarking I took a simple 13m row, 1.6G data, 1.7G index table. Loading it with standard mysql client took 30min (7200r/s). Paramy unfortunately hit auto-inc lock, and didn’t provide too much of performance - data load took 25min (8700r/s). The win here was mostly because everything until InnoDB being parallel, as instead of 100% core use, mysqld went up to ~130%, so it obviously managed to get additional cycles from SMP.

Dropping AUTO-INC from the field description changed everything. Standard mysql load wasn’t much faster, but under paramy load mysqld managed to use ~320% of CPU core, and did the load in 10min (~21500r/s).

The interesting part - no INSERT asks for AUTO-INC values, so in theory one can just disable the locking during the load, and re-enable it same way as unique or foreign key constraint checks are re-enabled after import of data - that way loading of data into InnoDB would be much much faster, especially with many more cores - though then contention moves away to transaction log mutex.

Additional way to reduce contention inside InnoDB may be removing both reading and writing checksums- they have quite big CPU load share. Disabling this at least during bulk imports can be quite useful.

Oh, I mentioned the subjective feeling that 5.0 was slower. I finally benchmarked - 4.0 did the data load in five minutes, and went over 40000 rows/s. I hope this is the last test where 4.0 is twice faster than 5.0.

Categories: MySQL

5.0

Planet MySQL - May 26, 2008 - 1:58pm

I’m trying to run 5.0. Even Mark said he EOL’ed 4.0. Anyway, the upcoming journey will be very fun, and I’ll try writing down some notes here.

Categories: MySQL

Image gallery: The sorry state of law enforcement data sharing

Planet MySQL - May 26, 2008 - 8:00am
Law Enforcement Databases chart

Categories: MySQL

Sun & MySQL at Linuxtag 2008 Berlin (2008-05-28/2008-05-31)

Planet MySQL - May 26, 2008 - 3:00am

From May 28th-31st, the annual LinuxTag will take place in Berlin, Germany. I followed the growth and evolution of LinuxTag from the very early days and I have fond memories of the event back when it still took place at the University of Kaiserslautern and our SuSE "booth" was just a regular table taken from the lecture rooms...

Things have evolved a lot since then. Today, LinuxTag is one of the largest Linux/Open Source Events in Europe and my new employer Sun is a major sponsor this year. In addition to several talks and keynotes, there will be a large Sun booth in the exhibition area (Booth #205) and we will have a dedicated MySQL demo pod! Some of the things we plan to demo there are the upcoming MySQL Server releases (5.1, 6.0 with Falcon and Online Backup), MySQL Workbench, MySQL Enterprise Monitor as well as how to combine these with other Sun products like Glassfish, NetBeans, OpenSolaris or OpenOffice.

Some other stuff that we will be showcasing on the Sun booth:

  • Be Brilliant Faster with OpenSolaris: Develop, Debug, Deploy Apps Faster with ZFS and Dtrace, OpenSolaris Live CD – Fast, Free, and Easy to Install
  • Virtualize Your Business with xVM and VirtualBox: OpenSolaris, Windows, Linux & Mac OS X Virtualized, Develop on VirtualBox, Deploy on xVM, Free & Open
  • Sun Studio Software for OpenSolaris and Linux: C/C++/Fortran Compilers and Tools, x86 and SPARC
  • Cool New Features in OpenOffice.org 3.0: Importing PDFs and Managing Appointments, now with full support for MAC OS X (Aqua)
  • Discovering Open High Availability Cluster: Overview about HA Clusters, Community Group Projects, Single Node Cluster – Service Failover between Zones
  • GlassFish - the Open Source Java EE 5 Application Server: JRuby/ Rails, Ajax & Comet

I look forward to being there! Please contact me, if you are interested in visiting Linuxtag and would like to receive a free pass!

 

 

Categories: MySQL

Corruption using MySQL AES_[EN|DE]NCRYPT functions

Planet MySQL - May 25, 2008 - 3:52pm

I was contacted this week by a previous client regarding a failure of processing data. This was a Contact, Financial and Media Management system I developed for a non-for-profit organization a LAMJ stack, and I’ve had to do nothing since deployment in the past 3 years, no bug fixes, no feature enhancements. The only thing lacking is additional management reporting, and data is extracted for this option now.

It runs under commodity Hardware, Linux and MySQL and it’s only unscheduled downtime was an power failure before UPS power was installed. However this all changed last week. Processing of regular scheduled encrypted data simply failed unexpectedly.

A summary of the environment.

  • Data is inserted with AES_ENCRYPT(data,key);
  • Data is retrieved with AES_DECRYPT(data,key);
  • Data is never updated.
  • New data is always added, and historical data always remains.
  • The application has no end user access to modify data.
  • The application has no function anywhere to modify the data, so no rouge happening could have occured.
  • An AUTO_INCREMENT column and TIMESTAMP columns provide a level of auditing for data forensics.
  • Backup copies of data exist up to 3 years for reference.
  • The seed key has not changed.
The problem

Selecting the first 10 rows saved in the table (By AUTO_INCREMENT Key and confirmed by dates), 8 of 10 are now corrupt. Select the last 10 rows inserted, zero are corrupt. Across 20,000 records 75% are now corrupt.

A lot of analysis has been performed to identify and track the data that was recorded, a certain amount of data forensics, and it was confirmed information was successfully processed last month for example. As this performs financial transactions, there is a lot more auditing available and being reviewed however it is simply a mystery that I can’t solve.

  • What options remain? is this a Hardware problem, Disk or even Memory.
  • What other data maybe corrupt?
  • How can more investigation occur to track the cause of the problem.
mysql> select version(); +------------------+ | version() | +------------------+ | 4.1.10a-standard | +------------------+ 1 row in set (0.00 sec)
Categories: MySQL

Let the coding begin! The Google Summer

Planet MySQL - May 25, 2008 - 2:44pm
Let the coding begin! The Google Summer of Code 2008 coding session has begun.

I am maintaining this blog as a way of recording my progress for others to follow (and to prevent me from slacking off).

I am working on integrating Memcached with the MySQL Query Cache (mentored by Brian Aker from MySQL)

I've done some bonding through the mailing list, blogs, and IRC but I feel I should have done a bit more. I just moved back from college to my parents house and started working full time in Boston (wow what a commute). I've read through the code a few times and feel I have an understand for how it works. It compiles, tests, and runs on my dev box (gentoo 2.6). I've started reading the book Google sent me (SPOILER) and it is wonderful.


In reading through the MySQL codebase I've come up with a list of design decisions I will need to make in the first week of my project.

1)
Implement this as a different query cache storage engine of sorts by building an interface. One version uses the classic single system memory, the other uses Memcached and is controlled by a flag somewhere.
OR
Code this as a straight patch.

2)
How to optimize the functionality of the cache? Tables that update often are actually slower when cached due to the way caches are pruned currently. Is there a cleaner way to do this? Do I pursue this concurrently or wait till I've finished the initial goal of Memcached supported and then try to rework this?

3)
Using one Memcached cluster for several databases would be a good way to save on memory, but there are several things to consider in order to retain 'ACID' principles. Again, do I try and work on this concurrently, or be more cautious so as not to bite off more than I can chew (which I have a tendency to do...)

4)
How much time should I leave at the end for perf testing, regression testing, etc...
Making this work is wonderful, but if it slows down the cache further is it really worthwhile?


Goals for Monday June 2nd:
-Communicate more.
-Have a solid timeline complete with estimated goals that I can rework at each iteration.
-Have answers to these design decisions that I can begin actively working on.
-Have real code, that compiles, and maybe does something (stubs and such).
Categories: MySQL

MySQL

Yahoo

Recent comments

About Me

Jay Janssen
Yahoo!, Inc.
jayj at yahoo dash inc dot com

MySQL
High Availability
Global Load Balancing
Failover

View Jay Janssen's LinkedIn profileView Jay Janssen's Facebook profile

User login

Friends

Links