Jay Janssen
Yahoo!, Inc.
jayj at yahoo dash inc dot com
MySQL
High Availability
Global Load Balancing
Failover
iostat is one of the most important tools for measuring disk performance, which of course is very relevant for database administrators, whether your chosen database is Postgres, MySQL, Oracle, or anything else that runs on GNU/Linux. Have you ever wondered where statistics like await (average wait for the request to complete) come from? If you look at the disk statistics the Linux kernel makes available through files such as /proc/diskstats, you won’t see await there. How does iostat compute await? For that matter, how does it compute the average queue size, service time, and utilization? This blog post will show you how that’s computed.
First, let’s look at the fields in /proc/diskstats. The order and location varies between kernels, but the following applies to 2.6 kernels. For reads and writes, the file contains the number of operations, number of operations merged because they were adjacent, number of sectors, and number of milliseconds spent. Those are available separately for reads and writes, although iostat groups them together in some cases. Additionally, you can find the number of operations in progress, total number of milliseconds during which I/Os were in progress, and the weighted number of milliseconds spent doing I/Os. Those are not available separately for reads and writes.
The last one is very important. The field showing the number of operations in progress is transient — it shows you the instantaneous value, and this “memoryless” property means you can’t use it to infer the number of I/O operations that are in progress on average. But the last field has memory, because it is defined as follows:
Field 11 — weighted # of milliseconds spent doing I/Os This field is incremented at each I/O start, I/O completion, I/O merge, or read of these stats by the number of I/Os in progress (field 9) times the number of milliseconds spent doing I/O since the last update of this field. This can provide an easy measure of both I/O completion time and the backlog that may be accumulating.
So the field indicates the total number of milliseconds that all requests have been in progress. If two requests have been waiting 100ms, then 200ms is added to the field. And thus it records what happened over the duration of the sampling interval, not just what’s happening at the instant you look at the file. We’ll come back to that later.
Now, given two samples of I/O statistics and the time elapsed between them, we can easily compute everything iostat outputs in -dx mode. I’ll take them slightly out of order to reflect how the computations are done internally.
Although the computations and their results seem both simple and cryptic, it turns out that you can derive a lot of information from the relationship between these various numbers. This is one of those tools where a few lines of code have a surprising amount of meaning, which is left for the reader to understand. I’ll get more into that in the future.
Related posts:
Related posts brought to you by Yet Another Related Posts Plugin.
There was a discussion on LinkedIn one month ago that caught my eye:
Database search by "within x number of miles" radius?
Anyone out there created a zipcode database and created a "search within x numer of miles" function ?
Thankful for any tips you can throw my way..
J
A few people commented that some solutions wouldn't scale. To understand why these sorts of geographic search queries are problematic in MySQL, it's best to show some execution plans on dummy data:
PLAIN TEXT SQL:Did you notice that we estimate just as many rows on the first EXPLAIN as the second one? That doesn't make any sense! The index covers x,y and col_a and should be eliminating a lot of searching, since there is only one row which meets this condition!
The reason for this is simply a missing feature of the MySQL optimizer - and it has to do with using x BETWEEN 30 and 40 (and it's also true with x >= 30 AND x <= 40). Using a range like this prevents us from using the rest of the index. There is a workaround, but it's not pretty:
PLAIN TEXT SQL:The ugliest thing about this, is that in real life you wouldn't know all your possible values of X or Y, and so you may end up with a very big IN list. The workaround to this, is to create steppings of the value X and Y that we can use for indexes:
PLAIN TEXT SQL:Fantastic! The only remaining problem with this query is that it's not quite identical to our original. In this query 60.79 will be floored to 60 (and erroneously included in our results):
PLAIN TEXT SQL:However, this is a quick fix by re-including the original WHERE conditions (we are just no longer using an index on them):
PLAIN TEXT SQL:Conclusion:
My examples were only on a small amount of data (16 000 rows) that fitted in memory, but the original query would have full table scanned if I didn't use a FORCE INDEX hint. Add more data, and if X can't filter out enough rows by itself this can create a real problem.
Workarounds are all very good, but they also make applications more difficult to maintain. If you really want to do these types of queries, you should give Sphinx a try.
Entry posted by Morgan Tocker | 3 comments
(Click on the image to enlarge)
Dennis Wolf, former MySQL CFO, has been undergoing Plasmapheresis treatment in the last week and despite a mild setback due to an infection, he will be checking out of the hospital today to continue rehab as an outpatient. Dennis reports that he has increased mobility in his leg and that the test for Devic's Disease and Multiple Sclerosis has come back negative. So there's a bit of a mystery as to what has caused this NMO flareup, but the fact that it's not Devic's disease is good news as it means a recurrence is less likely.
Dennis will be continuing treatment as an outpatient for the next several weeks and will also be getting a second opinion from the experts up at UCSF.
Marten Mickos put together a couple of nice posters that were hanging in Dennis' hospital room including the one above featuring the MySQL executive team (known informally as the 4th Box Club) and a quote from James Stockdale:
“You must never confuse faith that you will prevail in the end — which you can never afford to lose — with the discipline to confront the most brutal facts of your current reality, whatever they might be.”Dennis, we are thrilled to hear of your progress!
Feel free to add your comments for Dennis below.
FOSDEM 2010, Sunday 7 February, the MySQL Developer Room packed with 12 talks! And this year we serve Python just before the lunch break.
In 20 minutes I'll try to give an overview of the drivers currently available for connecting your Python applications with MySQL. Incase you wonder, this will not evolve around MySQL Connector/Python alone!
We'll also go over some frameworks and tools like SQLAlchemy.
20 minutes, it's not much, but should be enough. I hope to get a similar talk accepted for the MySQL Conference&Expo 2010.
PlanetMySQL Voting:
Vote UP /
Vote DOWN
and later on….
if (prgflag & (READ_ALL+EXTRA_RECORD)) records++;Feel free to think about that for a second.
(I have an urge to add this to questions asked in a job interview…)
In total, we received 20 submissions from 15 speakers. We'd like to thank them very much for their great proposals!
As we only have 12 speaking slots (20 mins each) available that day, we first needed to perform a selection process. Unfortunately there wasn't enough time to perform a full-blown voting process that involved the community at large. Since we didn't want to do this in a completely closed committee, we decided to involve all speakers that submitted a talk in this.
After this voting process, the final candidates are (ordered by last name):
Speaker: David Axmark
Title: Drizzle, A MySQL fork for the Web
Abstract:
Speaker: Piotr Biel
Title: Multi-Master Replication Manager for MySQL
Abstract:
Speaker: Ronald Bradford
Title: 10x performance improvements - A case study
Abstract:
While part of a longer talk, we will be reviewing the highlights including:
The 10 Points are:
Bonus: Front end improvements
Speaker: Kris Buytaert
Title: MySQL HA overview
Abstract:
Speaker: Stéphane Combaudon
Title: Correcting replication data drift with Maatkit
Abstract:
Replication with MySQL is very easy to set up but also very easy to break. As soon as inconsistencies are introduced, the master and the slaves start being desynchronized - and most often, you have no way to be quickly and efficiently aware of the problem as MySQL offers no built-in mechanism to check data consistency between servers.
Fortunately Maatkit has a neat solution with 2 nice scripts: mk-table-checksum, which is a tool to detect inconsistencies between a master and its slaves and mk-table-sync, which can correct automatically these problems detected with mk-table-checksum.
This session will show you first how to use both tools depending on your replication setup (master-slaves or master-master) and then how to get the most from the multiple options that are available.
Speaker: Marc Delisle
Title: State of phpMyAdmin
Abstract:
Speaker: Seppo Jaakola
Title: Galera Replication for MySQL
Abstract:
This session will present the current state of Galera replication project. The presentation will contain elements like:
Speaker: Vladimir Kolesnikov
Title: Effective SSD for your Database - a PBXT Primer
Abstract:
Speaker: Giuseppe Maxia
Title: Sharding for the masses
Abstract:
Speaker: Kristian Nielsen
Title: Beyond MySQL GA: patches, storage engines, forks, and pre-releases
Abstract:
The main benefits of a thread pool design is that users can now have thousands of connections to the MySQL Server in a scalable manner. It also provides a performance which is more independent of the number of concurrently active connections than previously.
Speaker: Geert Vanderkelen
Title: MySQL and Python: an overview
Abstract:
There has been lots of chatter about other languages, but not much about Python. This talk will give an overview of what is available to connect to MySQL, and also introduce MySQL Connector/Python.
These talks will now be put in the room schedule on the MySQL Forge Wiki. We look forward to having these sessions! If you happen to be a FOSDEM, please stop by at our Developer Room!
By Connie Chan, Manager, Yahoo! for Good
In December, Yahoo! launched our You In? program at kindness.yahoo.com to encourage people around the world to spread happiness by doing random acts of kindness. Thanks to you, our big-hearted Yahoo! community who helped spread the word, over 315,000 people from 11 countries updated their Yahoo! status with good deeds, and more than 500 stories were posted to the You In? Flickr Group.
Check out some of our favorites –
Inspired by amazing stories like these, Yahoo! also did some of our own random acts of kindness.
When we heard that Bill anonymously paid for a family’s overdue water bills , Yahoo! paid the heating bills for 12 families from a small town in Kentucky who were at risk of having their heat turned off in the winter.
Yahoos hand delivered 50 hot meals to farm labor families who didn’t have transportation to get to the food bank in Quincy, Washington.
And on the busiest travel day of the year, Yahoo! went to the San Jose and San Francisco International Airports to pay for people’s baggage fees.
But the kindness didn’t stop there! We also sent 50 kids from a Boys & Girls Club in Washington D.C. to the Nutcracker Ballet, took low income seniors in elderly care facilities on holiday shopping sprees in San Diego and Miami, gave DVD collections to children’s hospitals in Portland and Cincinnati, and more.
So our new year’s resolution is to keep this warm fuzzy holiday spirit alive by spreading ripples of kindness through our year-round Purple Acts of Kindness program. You In?
Happy New Year to all our readers! Welcome to 2010 and the 174th edition of Log Buffer, the weekly review of database blogs.
MySQLThe MySQL ’sphere since the holidays has been thick with posts on the matter of Oracle’s purchase of Sun, and thereby of MySQL. And in particular, there’s been a lot of talk about MySQL founder Monty Widenius’s response. I call all of this the . . .
Monty My-ThonOn the 28th of December, Monty framed the issue thus: Help keep the Internet free.
Singer Wang of Pythian, in reply, offers his perspective on GPL/ASL/BSD License Misconceptions and MySQL.
On Poo-tee-weet, Lukas Kahwe Smith is heard to say, Come on Monty . . . “What on earth is Monty . . . thinking? How can you spin around 180 and expect to come of believable? How can suddenly the GPL be the wrong choice? How can suddenly OSS depend on proprietary sales?”
On the WireLust blog, Terrence Curran writes, Monty Widenius is trying to regain control of MySQL and why this is bad for OSS.
Kristian Nielsen shares some Oracle speculations, stating, “I think it is basically a matter of obtaining control over MySQL.”
Antony Curtis throws in his two cents: “The topic of today is [Monty's] ‘Save MySQL’ campaign and how I believe it is unnecessary. . . . In fact, I believe that it could be harmful.”
All that aside, things keep rolling, and DBAs keep DBAing. Simon Mudd shared his thoughts and some suggestions on managing MySQL grants.
On someGreatTechName, Piotr Jasiulewicz shows how to get data without reading it – the power of covering indexes in MySQL.
Geert Vanderkelen, Some Abstract Type, has the coolest-looking rows ever in his post, A chessboard in MySQL: make your moves.
Peter Zaitsev of the MySQL Performance Blog lays out the principles of upgrading MySQL, “ . . . a very interesting task as you can approach it with so much different ‘depth’. For some this is 15 minutes job for others it is many month projects. Why is that?”
SQL ServerOn the SQL Server side, Aaron Bertrand likewise shares his experiences upgrading 2005 => 2008, describing the steps he took in his careful crossover.
Aaron also has a quick poll: what is your favorite Management Studio tip or trick?
Dan Jones has a question too: “ . . . there are two types of DBAs: those who are myopic and those who are leaders.”what kind of DBA are you?
Simon Sabin wonders aloud, Should PASS hold the conference on the East coast?
Simon also has his latest TSQL challenge – remove duplicates from a string.
If that’s not enough T-SQL for you, Adam Machanic has issued his invitation for T-SQL Tuesday #002.
OracleLet’s begin with nothing. Tanel Poder wishes to remind us that NULL is not zero!, with, “an example [of] how misunderstanding NULLs may cause your application to return different results than what was intended.”
Martin Widlake has been busy decoding high_value and low_value for us. He writes, “The table DBA_TAB_COLUMNS holds the LOW_VALUE and HIGH_VALUE for columns. This information is potentially very useful to us . . . What is not so helpful is that Oracle stores, and displays, the information in an internal raw format. Which is utterly unhelpful to us of course.”
Jonathan Lewis clarifies copy stats. “ . . . someone was having trouble,” he writes, “copying stats from one index to another using the import_index_stats and export_index_stats procedures from package dbms_stats modifying the contents of their ’stat table’ between the export and import. . . . Part of the problem with this approach is that you’re not really supposed to do what they were trying to do . . . ”
Guy Harrison, meanwhile, elucidates 11gR2’s IGNORE_ROW_ON_DUPKEY_INDEX hint, “ . . . [one] of the strangest new features in 11GR2 . . . Why is this so strange? Mainly because unlike almost all other hints, this hint has a semantic effect: it changes the actual behavior – not just the optimization – of the SQL.”
Walking in the footsteps on Vasco da Gama, Luis Moreno Campos proclaims himself the first Portuguese Oracle nerd to unpack an Oracle-Sun Exadata v2. Congratulations, Luis! I think.
And we close this edition of Log Buffer with Jonathan Lewis, who on first principles, “ . . . just had to start the new year with a little humour,” courtesy of Og, Sumerian DBA.
That’s all for now. If I’ve missed you favourite DB blog from the last week, please leave a comment. See you again for LB #175.
This tweet from former MySQL AB CEO Mårten Mickos caught my eye. It shows a trend of increased demand for MySQL and PostgresSQL expertise while job postings on job websites for those with Oracle and Ingres expertise declined.
I was a little shocked by the graph until you realize it’s just a trend. I then looked at the absolute number of jobs requesting database expertise and the story quickly becomes clear….
So if you are looking for a DB admin job the overall number of jobs in that sector are overwhelmingly Oracle jobs but the growth is in MySQL and PostgresSQL expertise.
Technorati Tags: Add new tag, Database, Ingres, Mårten Mickos, MySQL, Oracle, Oracle Database, PostgreSQL
Google unveils the Nexus One. RMS explains his position on dual licensing. And more.
Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca
“Tracking the open source news wires, so you don’t have to.”
For the latest on Oracle’s acquisition of MySQL via Sun, see Everything you always wanted to know about MySQL but were afraid to ask
# Google launched the Nexus One Android phone.
# Richard Stallman explained his position on selling exceptions to the GNU GPL.
# Novell’s chief technology and strategy officer for open source, Nat Friedman, left the company.
# Dirk Riehle made “The Economic Case for Open Source Foundations”.
# InformationWeek published an interview with Red Hat CEO, Jim Whitehurst, on the recession, virtualization and Steve Ballmer.
# Lenovo introduced the Lenovo Skylight, a Linux- and ARM-based smartbook device and showcased the IdeaPad U1 hybrid notebook, which is both a Windows notebook and a Linux tablet.
# The Palm webOS developer program is now open. Membership fee waived for developers of open source apps.
# Dave Rosenberg speculated on why we have not seen more open source acquisitions.
# MSI and Novell announced the upcoming availability of SUSE Moblin preloaded on the MSI U135 netbook.
# Lucid Imagination released its LucidWorks Certified Distribution for Solr 1.4.
# Acquia announced that it grew to 400 paying customers in 2009.
# Jaspersoft’s CEO, Brian Gentile, targetted 50% growth in 2010, following 60% in 2009.
# Andy Updegrove reviewed the CodePlex Foundation’s progress, while Sam Ramji reflected on its first 120 days.
# Wipro joined the Open Handset Alliance.
# WaveMaker claimed to have doubled annual revenues and achieved profitability in 2009.
# Likewise predicted at least 100% sales growth in 2010.
If you’ve wondered whether a MySQL-based column database can hold its own against one of the big, leading row-based databases in a benchmark with terabytes of data and serious queries, well, now you at least have some information to consider. We recently commissioned one of the top data warehouse gurus out there – Bert Scalzo – to test InfiniDB against a leading row-based database to see how well we’d do. Bert’s an expert not only in data warehousing but also in Read More...
PlanetMySQL Voting:
Vote UP /
Vote DOWN
A couple of posts have been published recently worrying about the impact of more open source specialist vendors being acquired by proprietary vendors.
This is an issue that crops up occasionally. Usually when a major acquisition has been announced, and the current questioning seems to be driven by the ongoing saga of Oracle-Sun-MySQL, as well as the rumoured purchase of Zimbra by VMware.
While fear of the unknown is understandable, to my mind the concern about open source specialists being acquired by proprietary vendors is driven by parochialism and misplaced assumptions about the rate of acquisitions and the acquiring company’s intentions.
For a start the statistics suggest that acquisitions involving open source vendors have declined in recent years (contrary to our expectations to be honest). According to our preliminary figures there were 24 M&A deals involving open source vendors in 2009, compared to 29 in 2008 and 35 in 2007. Dave Rosenberg makes the case that we have seen less open source M&A than we might have expected.
There is always the fear, however, that a proprietary vendor could acquire an open source rival in order to shut it down. This is a theory we at The 451 Group investigated last year via a TechDealMaker service report asking “Could an open source project survive a hostile acquisition?” (451 clients can access the report here).
Looking at the history of M&A involving open source vendors we were unable to identify a single example of a proprietary vendor acquiring an open source project in order to kill it off.
Another significant fear involving open source acquisitions is that the acquiring company will suddenly change the licensing and/or pricing in order to generate revenue from users open source of the open source project.
To me this is a fear based on a false assumption that the only way to monetize open source is directly. If we look at the strategies used by proprietary vendors to generate revenue from open source (as we did oin our Market Insight Service report “How third parties generate revenue from open source“, which was itself adapted from our Open Source is Not a Business Model CAOS report) we find that they are more likely to do so indirectly via complementary products and services.
In contrast open source specialist vendors have no choice but to attempt to monetize the open source software directly, either through support or proprietary licensed add-ons, and we have observed that this creates an inherent tension.
There is also a false assumption that open source specialist vendors are more committed to an open source “philosophy”. Some are, to be sure, but some simply see open source as a means to an end - treating it as a license tactic that disrupts competitors and expends potential adoption. There is nothing inherently wrong with that, but it does mean that for a great many open source “projects” the idea of the development community is a myth.
As previously discussed, Matt Asay noted last year that “vendors that have proprietary selling points elsewhere don’t need to control open-source code.”
In fact, I would suggest that vendors with proprietary selling points elsewhere have more to gain from releasing control of an open source project. Dirk Reihle explained the financial benefits this week with his Economic Case for Open Source Foundations, including sharing development expenses, increasing profits per sale, increases sales, and expanding the addressable market.
The fact that proprietary vendors have proprietary selling points elsewhere means that they are also in a better financial position to trade control for community via a foundational approach, in contrast to open source specialists.
There may well be situations where the acquisition of open source specialists by proprietary vendors might give cause for concern, but I believe it is wrong to assume that the impact will be negative. While many open source specialists might have something to fear regarding increased M&A activity, in the broader context open source software has more potentially to gain from the increased involvement of proprietary vendors than it has to lose.
Last week I took SQLAlchemy v0.6 out of its trunk and tested it again with our MySQL Connector/Python. And surprise! SQLAlchemy comes with a 'dialect' supporting it! Nice!
However, we're still a long way on making it work. I found some additional problems which need to be fixed first.
Oh, there were more errors and failures and all that due to bugs in MySQL Connector/Python. There were already some modification to the dialect (e.g. name change), and there might be some more.
If we got this sorted, Turbogears should work too!
In my previous rounds with DataWarehouse oriented engines I used single table without joins, and with small (as for DW) datasize (see http://www.mysqlperformanceblog.com/2009/10/02/analyzing-air-traffic-performance-with-infobright-and-monetdb/, http://www.mysqlperformanceblog.com/2009/10/26/air-traffic-queries-in-luciddb/, http://www.mysqlperformanceblog.com/2009/11/02/air-traffic-queries-in-infinidb-early-alpha/). Addressing these issues, I took Star Schema Benchmark, which is TPC-H modification, and tried run queries against InfoBright, InfiniDB, LucidDB and MonetDB. I did not get results for MonetDB, will explain later why. Again primary goal for test was not to get just numbers, but understand specifics of each engine and their ability to handle amount of data and execute queries.
All details I have are available on our Wiki http://www.percona.com/docs/wiki/benchmark:ssb:start and the specification of benchmarks you can get there http://www.percona.com/docs/wiki/_media/benchmark:ssb:starschemab.pdf.
I generated data with scale factor = 1000, which gave me 610GB of data in raw format and loaded into each engine.
There difference in engines gets into play. While InfoBright and InfiniDB does not need indexes at all (you actually can't create indexes here), they needed for LucidDB and MonetDB, and it changes load time and datasize after load significantly. The numbers
I put in results do not include indexing time, but it also should be considered.
And indexes are exactly reason why I could not get results for MonetDB, there I faced issue
I was not prepared for. MonetDB requires that index fits into memory during indexing procedure, and for 610GB the index may get to 120GB size, and I did not have that amount of memory ( the box is only 16GB of RAM). MonetDB experts recommended me to extended
swap partition to 128GB, but my partitions layout was not really prepared for, I just did not expect I need big swap partition.
Loading
So load time.
InfiniDB can really utilize all available cores/cpus in systems ( I run benchmark on 8 cores box), and it allowed to load data faster than other engines. Though LucidDB and MonetDB
are also have multi-thread loaders, only InfoBright ICE used single core.
InfiniDB: 24 010 sec
MonetDB: 42 608 sec (without indexes)
InfoBright: 51 779 sec
LucidDB: 140 736 sec (without indexes)
I should note that time to create indexes in LucidDB was also significant and exceeded loading time. Full report on indexes are available here http://www.percona.com/docs/wiki/benchmark:ssb:luciddb:start
Data size
Size after load is also interesting factor. InfoBright is traditionally good with compression,
though compression rate is less than in case with AirTraffic table. I was told this is because
lineorder table comes not in sorted order, which one would expect in real life. Actually
the same complain I heard from InfiniDB experts - if put lineorder data in sorted order, loading
time can decrease significantly.
Datasize after load:
InfoBright: 112G
LucidDB: 120GB (without indexes)
InfiniDB: 626GB
MonetDB: 650GB (without indexes)
Queries time
Now on queries time.
Full results you can find on page http://www.percona.com/docs/wiki/benchmark:ssb:start,
and graph is below. There couple comments from me.
InfoBright was fully 1 CPU bound during all queries. I think the problem
that engine can use only single cpu/core is getting significant limitation
for them. For query 3.1 I got the surprising result, after 36h of work I got
error that query can't be resolved by InfoBright optimizer and I need
to enable MySQL optimizer.
InfiniDB is otherwise was IO-bound, and processed data fully utilizing
sequential reads and reading data with speed 120MB/s. I think it allowed
InfiniDB to get the best time in the most queries.
LucidDB on this stage is also can utilize only singe thread with results sometime better,
sometime worse than InfoBright.
Results:
Query InfoBright InfiniDB LucidDB Q1.1 48 min 21.67 sec (2901.67 sec) 24 min 26.05 sec (1466.05 sec) 3503.792 sec Q1.2 44 min 55.37 sec (2695.37 sec) 24 min 25.83 sec (1465.83 sec) 2889.903 sec Q1.3 45 min 53.49 sec (2753.49 sec) 24 min 27.25 sec (1467.25 sec) 2763.464 sec Q2.1 1 hour 54 min 27.74 sec (6867.74) 19 min 44.35 sec (1184.35 sec) 9694.534 sec Q2.2 1 hour 13 min 33.15 sec (4413.15) 19 min 49.56 sec (1189.56 sec) 9399.965 sec Q2.3 1 hour 8 min 23.41 sec (4103.41) 19 min 52.27 sec (1192.25 sec) 8875.349 sec Q3.1 NA 19 min 11.23 sec (1151.23 sec) 16376.93 sec Q3.2 3 hours 30 min 17.64 sec (12617.64 sec) 19 min 28.55 sec (1168.55 sec) 5560.977 sec Q3.3 2 hours 58 min 18.87 sec (10698.87 sec) 19 min 58.29 sec (1198.29 sec) 2517.621 sec Q3.4 1 hour 41 min 41.29 sec (6101.29 sec) 12 min 57.96 sec (777.96 sec) 686.202 sec Q4.1 8 hours 53 min 52.55 sec (32032.55 sec) 32 min 57.49 sec (1977.49 sec ) 19843.213 sec Q4.2 5 hours 38 min 7.60 sec / 5 hours 36 min 35.69 sec (20195.69 sec) 33 min 35.45 sec (2015.45 sec) 15292.648 sec Q4.3 12 hours 58 min 4.27 sec (46684.27 sec) 33 min 47.32 sec (2027.32 sec) 7241.791 secGraph with results (time in sec, less time is better)
Conclusions
Entry posted by Vadim | 11 comments
Over the years, Yahoo! Labs has partnered closely with many universities and colleges. As a company, we want to invent the sciences needed for the next-generation Internet. Working with the faculty, researchers, and students of the world’s great universities is one of the best ways to meet that goal.
Looking back, 2009 was our most successful year to date. Through a variety of programs, we brought together a lot of smart folks and important ideas. As we head into 2010, we want to share our excitement about these programs and events.
Hack U:
Our University Hack Day competitions (Hack U) brought Yahoo!’s open technology, top developers, and hack spirit to campuses for technical talks and a 24-hour student programming competition. Students from 11 universities in the U.S., Canada, and India built more than 300 hacks. Winners from each campus participated in the New York Open Hack Day, ultimately winning 6 of the 11 categories.
Yahoo! Big Thinkers Series:
As part of our Big Thinkers distinguished speaker series, Yahoo! brought the brightest minds in the academic community to our campus for talks, and we’ve made those lectures available on the Web for all to see and enjoy. We had talks on
Next year’s Big Thinkers series should be just as captivating — we’ve got a great lineup for 2010.
Webscope:
Our Webscope™ program offers a reference library of 24 interesting and scientifically useful datasets we’ve made available for noncommercial use by academics and other scientists. More than 880 academic researchers have used the datasets, resulting in 28 technical papers, journal articles and theses so far. Very few companies have the resources and global scale to help academics and students interact with the types of real-world datasets it takes to spark innovation, but Yahoo! is one of them and it’s really paid off.
Key Scientific Challenges (KSC):
Our Key Scientific Challenges program partnered with 21 graduate students in 2009. We gave them scholarships, plus the opportunity to work closely with Yahoo!’s scientists on solving some of the biggest challenges the Web offers. The winners attended the KSC Graduate Student Summit, where they presented the fruits of their labor to fellow students and other Yahoo! researchers. It was the first of what will surely be many peer-reviewed conferences for these talented folks.
Cloud Computing Research:
Our M45 Cloud Computing Cluster is a 4,000-processor testbed being used in academia for the advancement of cloud computing research and education. Faculty members at Carnegie Mellon University have written 40 technical publications based on research performed on M45, and the cluster is now being used by three additional universities. Yahoo! is also part of both the Open Cirrus testbed and the Open Cloud Consortium.
Global Impact – Yahoo! Days in Haifa:
In November we held two very successful “Yahoo! Days” at major Israeli universities – Tel Aviv University and Technion, the Israel Institute of Technology. Yahoo! scientists from around the world joined the event, which featured a keynote by Prabhakar Raghavan, the head of Yahoo! Labs. Pictures from the Tel Aviv and Technion events are available on Flickr.
Netflix Prize-winner Yehuda Koren’s Campus Tour:
Yehuda Koren, a research scientist with Yahoo! Labs in Israel, was part of a team that spanned countries, time zones, and companies, collaborating over a three-year period to win the Netflix Prize, one of the most well-publicized and interesting machine learning contests ever conceived. It’s a story with surprises, twists and turns, game-playing, late nights, and computational brute force. There’s also deep science behind it all — science that will drive future innovation on the Web. Naturally, Yahoo! thought it was the kind of story that students and faculty at some of the world’s best universities would like to hear in person. Yehuda has visited seven universities since the award was announced in September 2009, including MIT, Harvard, Carnegie Mellon, Columbia, Rutgers, Princeton, Penn, and GA Tech. And he’ll be back at in 2010.
Purple Footprints:
And, of course, we continued to sponsor our campus seminar series, leaving “purple footprints” at Carnegie Mellon University, MIT, and University of Illinois at Urbana-Champaign. All of those seminars have been recorded and are available online. Yahoos also presented more than 100 lectures, seminars, workshops and training sessions at campuses worldwide.
If you want to learn more about Yahoo! Academic Relations programs and how you can participate in 2010, please visit us.
Ron Brachman
Vice President
Yahoo! Labs