<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>PlanetMysql.ru - информация о СУБД MySQL &#187; facebook</title>
	<atom:link href="http://planetmysql.ru/category/facebook/feed/" rel="self" type="application/rss+xml" />
	<link>http://planetmysql.ru</link>
	<description>Блог о самой популярной СУБД MySQL</description>
	<lastBuildDate>Fri, 10 Feb 2012 22:53:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
		<item>
		<title>Can the People&#8217;s House become a social platform for the people?</title>
		<link>http://feedproxy.google.com/~r/oreilly/radar/atom/~3/M39nyP7Edrs/congressional-hackathon-2011.html?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=can-the-peoples-house-become-a-social-platform-for-the-people</link>
		<comments>http://feedproxy.google.com/~r/oreilly/radar/atom/~3/M39nyP7Edrs/congressional-hackathon-2011.html#comments</comments>
		<pubDate>Mon, 12 Dec 2011 18:30:00 +0000</pubDate>
		<dc:creator>O'Reilly Radar</dc:creator>
				<category><![CDATA[civic apps]]></category>
		<category><![CDATA[Congress]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[gov 2.0]]></category>
		<category><![CDATA[hackathon]]></category>
		<category><![CDATA[open data]]></category>
		<category><![CDATA[open government]]></category>
		<category><![CDATA[Social media]]></category>

		<guid isPermaLink="false">http://planetmysql.ru/?guid=bd5998780d8066fb9ff6ab741a6037a1</guid>
		<description><![CDATA[InSourceCode developers work on "Madison" with volunteers.

There wasn't a great deal of hacking, at least in the traditional sense, at the "first congressional hackathon." Given the general shiver that the word still evokes in many a Washingtonian in 2011, that might be for the best. The attendees gathered together in the halls of the United States House of Representatives didn't create a more interactive visualization of how laws are made or a mobile health app. As open government advocate Carl Malamud observed, the "hack" felt like something even rarer in the "Age of the App for That:" 


Impressed @MattLira pulled off a truly bipartisan tech event on the hill. *that* is a true hack. #inhackwetrust&#8212; Carl Malamud (@carlmalamud) December 7, 2011


In a time when partisanship and legislative gridlock have defined Congress for many citizens, seeing the leadership of the United States House of Representatives agree on the importance of using the power of data and social networking to open government was an early Christmas present. 

"Increased access, increased connection with our constituents, transparency, openness is not a partisan issue," said House Majority Leader Eric Cantor. 

"The Republican leader and I may debate vigorously on many issues, but one area where we strongly agree is on making Congress more transparent and accessible," said House Democratic Whip Steny Hoyer in his remarks. "First, Congress took steps to open up the Capitol building so citizens can meet with their representatives and see the home of their legislature. In the same way, Congress is now taking steps to update how it connects with the American people online." 

An open House

While the event was branded as a "Congressional Facebook Developer Hackathon," what emerged more closely resembled a loosely organized conference or camp. 

Facebook executives and developers shared the stage with members of Congress to give keynotes to the 200 or so attendees before everyone broke into discussion groups to talk about constituent communications, press relations and legislative data. The event might be more aptly described as a "wonk-a-thon," as Sunlight Foundation's Daniel Schuman put it last week. 

This "hackathon" was organized to have some of the feel of an unconference, in the view of Matt Lira, digital director for the House Majority Leader. Lira sat down for a follow-up interview last Thursday. 

"There's a real model to CityCamp," he said. "We had 'curators' for the breakout. Next time, depending on how we structure it, we might break out events that are designed specifically for programming, with others clustered around topics. We want to keep it experimental."

Why? "When Aneesh Chopra and I did that session at SXSW, that personally for me was what tripped my thinking here," said Lira. "We came down from the stage and formed a circle. I was thinking the whole time that it would have been a waste of intellectual talent to have Tim O'Reilly and Clay Shirky in the audience instead of engaging in the conversation. I was thinking I never want to do a panel again. I want it to be like this." 

Part of the challenge, so to speak, of Congress hosting a hackathon in the traditional sense, with judging and prizes, lies in procurement rules, said Lira."There are legal issues around challenges or prizes for Congress," he explained. "They're allowed in the executive branch, under DARPA, and now every agency under the COMPETES Act. We can't choose winners or losers, or give out prizes under procurement rules." 

Whatever you call it, at the end of the event, discussion leaders from the groups came back and presented on the ideas and concepts that had been hashed out. You can watch a short video that EngageDC produced for the House Majority Leader's office below: 



What came out of this unprecedented event, in other words, won't necessarily be measured in lines of code. It's that Congress got geekier. It's that the House is opening its doors to transparency through technology.

Given the focus on Facebook, it's not surprising that social media took center stage in many of the discussions. The idea for it came from a trip to Silicon Valley, where Representative Cantor said he met with Facebook founder Mark Zuckerberg and COO Sheryl Sandberg, and discussed how to make the House more social. After that conversation, Lira and Steve Dwyer, director of online communications and technology for the House Democratic Whip, organized the event. 

For a sense of the ideas shared by the working groups, read the story of the first congressional "hackathon" on Storify.

"For government, I don't think we could have done anything more purposeful than this as a first meeting," said Lira in our interview. "Next, we'll focus on building this group of people, strengthening the trust, which will prove instrumental when we get into the pure coding space. I have 100% confidence that we could do a programming-only event now and would have attendance." 

A Likeocracy in alpha

As the Sunlight Foundation's John Wonderlich observed earlier this year, access to legislative data brings citizens closer to their representatives. 

"When developers and programmers have better access to the data of Congress, they can better build the databases and tools that let the rest of us connect with the legislature," he wrote. 

If more open legislative data goes online, when we talk about what's trending in Congress, those conversations will be based upon insight into how the nation is reacting to them on social networks, including Facebook, Twitter, and Google+.  

Facebook developers Roddy Lindsay, Tyler Brock, Eric Chaves, Porter Bayne, and Blaise DiPersia coded up a simple proof of concept of what making legislative data might look like.  "LikeOcracy" pulls legislation from a House XML feed and makes it more social. The first version added Facebook's ubiquitous "Like" buttons to bill elements. A second version of the app adds more opportunities for reaction by integrating ReadrBoard, which enables users to rate sections or individual lines as "Unnecessary, Problematic, Great Idea or Confusing." You can try it out on three sample bills, including the Stop Online Piracy Act.

Would "social legislation" in a Facebook app catch on? The growth of civic startups like PopVox, OpenCongress and Votizen suggests that the idea has legs. [Disclosure: Tim O'Reilly was an early angel investor in PopVox.]

Likeocracy doesn't tap into Facebook's Open Graph, but it does hint at what integration might look like in the future. Justin Osofsky, Facebook's director of platform partnerships, described how the interests of constituents could be integrated with congressional data under Facebook's new Timeline. Citizens might potentially be able to simply "subscribe" to a bill, much like they can now for any web page, if Facebook's "Subscribe" plug-in was applied to the legislative process. 

Opening bill markup online

The other app presented at the hackathon came not from the attendees but from the efforts of InSourceCode, a software development firm that's also coded for Congressman Mike Pence and the Republican National Committee. 

Rep. Darrell Issa, chairman of the House Committee on Oversight and Government Reform, introduced the beta version of MADISON on Wednesday, a new online tool to crowdsource
 legislative markup. The vision is that MADISON will work as a real-time markup engine to let the public comment on bills as they move through the legislative process. "The assumption is that legislation should be open in Congress," said Issa. "It should be posted, interoperable and commented upon." 

As Nick Judd reported at techPresident, the first use of MADISON is to host Issa and Sen. Ron Wyden's "OPEN bill," which debuted on the app. Last week, the congressmen released the Online Protection and Enforcement of Digital Trade Act (OPEN) at Keepthewebopen.com.  The OPEN legislation removes one of the most controversial aspects of SOPA, using the domain name system for enforcement, and instead places authority with the International Trade Commission to address enforcement of IP rights on websites that are primarily infringing upon copyright. 

Issa said that his team had looked at the use of wikis by Rep. John Culberson, who put the healthcare reform bill online in a wiki. "There are some problems with editors who are not transparent to all of us," said Issa. "That's one of the challenges. 
We want to make sure that if you're an editor, you're a known editor." 

MADISON includes two levels of authentication: email for simple commenting and a more thorough vetting process for organizations or advocacy groups that wish to comment. "Like most things that are a 1.0 or beta, our assumption is that we'll learn from this," said Issa. "Some members may choose to have an active dialog. Others may choose to have it be part of pre-markup record." 

Issa fielded a number of questions on Wednesday, including one from web developer Brett Stubbs: "Will there be open access or an API? What we really want is just data." Issa indicated that future versions might include that. 

Jayson Manship, the "chief nerd" at InSourceCode, said that MADISON was built in four days. According to Manship, the idea came from conversations with Issa and Seamus Kraft, director of digital strategy for the House Committee on Oversight and Government Reform. MADISON is built with PHP and MySQL, and hosted in RackSpace's cloud so it can scale with demand, said Manship. 

"It's important to be entrepreneurial," said Lira in our interview. "There are partners throughout institutions that would be willing to do projects of different sizes and scopes. MADISON is something that Issa and Seamus wanted to do.  They took it upon themselves to get the ball rolling. That's the attitude we need." 

"We're working to hold the executive accountable to taxpayers," said Kraft last week. "Opening up what we do here in these two halls of Congress is equally important. MADISON is our first shot at it. We're going to need a lot of help to make it better." 

Kraft invited the remaining developers present to come to the Rayburn Office Building, where Manship and his team had brought in half a dozen machines, to help get MADISON ready for launch.  While I was there, there were conversations about decisions, plug-ins and ideas about improving the interface or functionality, representing a bona fide collaboration to make the app better. 

There's a larger philosophical issue relating to open government that Nick Judd touched upon over at techPresident in a follow-up post on MADISON: 

The terms for the site warn the user that anything they write on it will become public domain &#8212; but the code itself is proprietary. Meanwhile, OpenCongress' David Moore points out that the code that powers his organization's website, which also allows users to comment on individual provisions of bill text, is open source and has been available for some time. In theory, this means the Oversight staff could have started from that code and built on it instead of beginning from scratch. The code being proprietary means that while people like Moore might be able to make suggestions, they can't just download it, make their own changes and submit them for community review &#8212; which they'd happily do at little or no cost for a project released under an open-source license.

As Moore put it, "Get that code on GitHub, we'll do OpenID, fix the design." 

When asked about whether the team had considered making MADISON code open source, Manship said  that "he didn't know, although they weren't opposed to it." 

While Moore welcomed MADISON, he also observed that Open Congress has had open-source code for bill text commenting for years. 


@seamuskraft @mattlira glad to chat, will email. We see first step as liberating full #opengovdata (API &#38; bulk) for MADISON &#38; OC &#38; open Web.&#8212; David Moore (@ppolitics) December 9, 2011



The decision by Issa's office to fund the creation of an app that was already available as open-source software is one that's worth noting, so I asked Kraft why they didn't fork OpenCongress' code, as Judd suggests. "While there was no specific budget expense for MADISON, it was developed by the Oversight Committee," said Kraft.

"While we like and support OpenCongress' code, it didn't fit the needs for MADISON," Kraft wrote in an emailed statement. 

What's next is, so to speak, an "OPEN" question, both in terms of the proposed SOPA alternative and the planned markup of SOPA itself on December 15. The designers of OPEN are actively looking for feedback from the civic software development community, both in terms of what functionality exists now and what could be built in future iterations.

THOMAS.gov as a platform

What Moore and long-time open-government advocates like Carl Malamud want to see from Congress is more structural change: 


Re: #hackwetrust, while we do seek leg. version control, public bill markup isn't ultimate goal. Exhaustive #opengovdata &#38; open API is (2/2)&#8212; David Moore (@ppolitics) December 8, 2011


@MattLira @DarrellIssa @SeamusKraft MADISON is much-welcomed, but PPF's #opengov ultimate goal is open API for @THOMASdotgov -cc @digiphile.&#8212; David Moore (@ppolitics) December 9, 2011



They're not alone.  Dan Schuman listed many other ways the House has yet to catch up with 21st century technology:

We have yet to see bulk access to THOMAS or public access to CRS reports, important legislative and ethics documents are still unavailable in digital format, many committee hearings still are not online, and so on.

As Schuman highlighted, the Sunlight Foundation has been focused on opening up Congress through technology since the organization was founded. To whit: "There have been several previous collaborative efforts by members of the transparency community to outline how the House of Representatives can be more open and accountable, of which an enduring touchstone is the Open House Project Report, issued in May 2007," wrote Schuman.

The notion of making THOMAS.gov into a platform received high-level endorsement from a congressional leader when House Minority Whip Steny Hoyer remarked on how technology is affecting Congress, his caucus and open government in the executive branch:

For Congress, there is still a lot of work to be done, and we have a duty to make the legislative process as open and accessible as possible. One thing we could do is make THOMAS.gov &#8212; where people go to research legislation from current and previous Congresses &#8212; easier to use, and accessible by social media. Imagine if a bill in Congress could tweet its own status.

The data available on THOMAS.gov should be expanded and made easily accessible by third-party systems. Once this happens, developers, like many of you here today, could use legislative data in innovative ways. This will usher in new public-private partnerships that will empower new entrepreneurs who will, in turn, yield benefits to the public sector. 

One successful example is how cities have made public transit data accessible so developers can use it in apps and websites. The end result has been commuters saving time every day and seeing more punctual trains and buses as a result of the transparency. Legislative data is far more complex, but the same principles apply. If we make the information available, I am confident that smart people like you will use it in inventive ways.

Hoyer's specific citation of the growth of open data in cities and an ecosystem of civic applications based upon it is further confirmation that the Gov 2.0 meme is moving into the mainstream.

Making THOMAS.gov into a platform for bulk data would change what's possible for all civic developers. What I really want is "data on everything," Stubbs told me last week. "THOMAS is just a visual viewer of the internal stuff. If we could have all of this, we could do something with it. What I would like is a data broker. I'd like a RESTful API with all of the data that I could just query. That's what the government could learn from Facebook. From my point of view, I just want to pull information and compile it." 

If Hoyer and the House leadership would like to see THOMAS.gov act as a platform, several attendees at the hackathon suggested to me that Congress could take a specific action: collaborate with the Senate and send the Library of Congress a letter instructing it to provide bulk legislative data access to THOMAS.gov in structured formats so that developers, designers and citizens around the nation can co-create a better civic experience for everyone.

"The House administration is working on standards called for by the rule and the letter sent earlier this year," said Lira. "We think they will be satisfactory to people. The institutions of the House have been following through since the day they were issued. The first step was issuing an XML feed daily. Next year, there will be a steady series of incremental process improvements. When the House Administrative Committee issues standards, the House Clerk will work on them. "

Despite the abysmal public perception of Congress, genuine institutional changes in the House of Representatives driven by the GOP embracing innovation and transparency are incrementally happening. As Tim O'Reilly observed earlier this year, the current leadership of the House on transparency is doing a better job than their predecessors. 

In April, Speaker Boehner and Majority Leader Cantor sent a letter to the House Clerk regarding legislative data release. Then, in September, a  live XML feed for the House floor went online. Yes, there's a long way to go on open legislative data quality in Congress. That said, there's support for open-government data from both the White House and the House.  

"My personal view is that what's important right now is that the House create the right precedents," said Lira. "If we create or adopt a data standard, it's important that it be the right standard." 
	
Even if open government is in beta, there needs to be more tolerance for experiments and risks, said Lira. "I made a mistake in attacking We the People as insufficient. I still believe it is, but it's important to realize that the precedent is as important as the product in government. In technology in general, you'll never reach an end.
We The People is a really good precedent, and I look forward to seeing what they do.
They've shown a real commitment, and it's steadily improving." 

A social Congress

While Sean Parker may predict that social media will determine the outcome of the 2012 election, governance is another story entirely. Meaningful use of social media by Congress remains challenged by a number of factors, not least an online identity ecosystem that has not provided Congress with ideal means to identify constituents online. The reality remains that when it comes to which channels influence Congress, in-person visits and individual emails or phone calls are far more influential with congressional staffers. 

As with any set of tools, success shouldn't be measured solely by media reports or press releases but by the outcomes from their use. The hard work of bipartisan compromise between the White House and Congress, to the extent it occurs, might seem unlikely to be publicly visible in 140 characters or less. 

"People think it's always an argument in Washington," said Lira in our interview. "Social media can change that. We're seeing a decentralization of audiences that is built around their interests rather than the interests of editors. Imagine when you start streaming every hearing and making information more digestible. All of a sudden, you get these niche audiences. They're not enough to sustain a network, but you'll get enough of an audience to sustain the topic. I believe we will have a more engaged citizenry as a result." 

Lira is optimistic. "Technology enables our republic to function better. In ancient Greece, you could only sustain a democracy in the size of city. Transportation technology limited that scope. In the U.S., new technologies enabled global democracy. As we entered the age of mass communication, we lost mass participation. Now with the Internet, we can have people more engaged again." 

There may be a 30-year cycle at play here. Lira suggested looking back to radio in the 1920s, television in the 1950s, and cable in the 1980s. "It hasn't changed much since; we're essentially using the same rulebook since the '80s. The changes made in those periods of modernization were unique." 

Thirty years on from the introduction of cable news, will the Internet help reinvigorate the founders' vision of a nation of, by and with the people? "I do think that this is a transformational moment," said Lira. "It will be for the next couple of years. When you talk to people &#8212; both Republicans and Democrats &#8212; you sense we're on the cusp of some kind of change, where it's not just communicating about projects but making projects better. Hearings, legislative government and executive government will all be much more participatory a decade from now. "

In that sweep of history, the "People's House" may prove to be a fulcrum of change. "If any place in government is going to do it, it's the House" said Lira. "It's our job to be close to the public in a way that no other part of government is. In the Federalist Papers, that's the role of the House. We have an obligation to lead the way in terms of incorporating technology into real processes. We're not replacing our system of representative government. We're augmenting it with what's now possible, like when the telegraph let people know what the votes were faster." 

Strata 2012 &#8212;  The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.
 
Save 20% on registration with the code RADAR20]]></description>
			<content:encoded><![CDATA[<p><img src="http://radar.oreilly.com/2011/12/12/1211-congress-hackathon.jpg" border="0" alt="Congressional hackathon" width="580" style="margin-bottom: 15px;" /><br />InSourceCode developers work on "Madison" with volunteers.</p>

<p>There wasn't a great deal of hacking, at least in the traditional sense, at the "first congressional hackathon." Given the general shiver that the word still evokes in many a Washingtonian in 2011, that might be for the best. The attendees gathered together in the halls of the United States House of Representatives didn't create a more interactive <a href="http://www.mikewirthart.com/wp-content/uploads/2010/05/howlawsmadeWIRTH2.jpg">visualization of how laws are made</a> or a <a href="http://radar.oreilly.com/2010/11/better-mobile-healthcare-decis.html">mobile health app</a>. As open government advocate Carl Malamud observed, the "hack" felt like something even rarer in the "Age of the App for That:" </p>

<div align="center">
<blockquote><p>Impressed <a href="https://twitter.com/MattLira">@MattLira</a> pulled off a truly bipartisan tech event on the hill. *that* is a true hack. <a href="https://twitter.com/search/%2523inhackwetrust">#inhackwetrust</a></p>&mdash; Carl Malamud (@carlmalamud) <a href="https://twitter.com/carlmalamud/status/144553094196367361" data-datetime="2011-12-07T23:05:17+00:00">December 7, 2011</a></p></blockquote>
</div>

<p>In a time when partisanship and legislative gridlock have defined Congress for many citizens, seeing the leadership of the United States House of Representatives agree on the importance of using the power of data and social networking to open government was an early Christmas present. </p>

<p>"Increased access, increased connection with our constituents, transparency, openness is not a partisan issue," said House Majority Leader Eric Cantor. </p>

<p>"The Republican leader and I may debate vigorously on many issues, but one area where we strongly agree is on making Congress more transparent and accessible," <a href="http://www.democraticwhip.gov/content/hoyer-remarks-first-congressional-facebook-developers-hackathon">said</a> House Democratic Whip Steny Hoyer in his remarks. "First, Congress took steps to open up the Capitol building so citizens can meet with their representatives and see the home of their legislature. In the same way, Congress is now taking steps to update how it connects with the American people online." </p>

<h2>An open House</h2>

<p>While the event was branded as a "<a href="http://majorityleader.gov/facebook">Congressional Facebook Developer Hackathon</a>," what emerged more closely resembled a loosely organized conference or camp. </p>

<p>Facebook executives and developers shared the stage with members of Congress to give keynotes to the 200 or so attendees before everyone broke into discussion groups to talk about constituent communications, press relations and legislative data. The event might be more aptly described as a "<a href="http://sunlightfoundation.com/blog/2011/12/05/house-holding-wonk-a-thon-on-public-access-to-congressional-info-this-wednesday/">wonk-a-thon</a>," as Sunlight Foundation's Daniel Schuman put it last week. </p>

<p>This "hackathon" was organized to have some of the feel of an unconference, in the view of Matt Lira, digital director for the House Majority Leader. Lira sat down for a follow-up interview last Thursday. </p>

<p>"There's a real model to <a href="http://citycamp.com">CityCamp</a>," he said. "We had 'curators' for the breakout. Next time, depending on how we structure it, we might break out events that are designed specifically for programming, with others clustered around topics. We want to keep it experimental."</p>

<p>Why? "When Aneesh Chopra and I did that session at SXSW, that personally for me was what tripped my thinking here," said Lira. "We came down from the stage and formed a circle. I was thinking the whole time that it would have been a waste of intellectual talent to have Tim O'Reilly and Clay Shirky in the audience instead of engaging in the conversation. I was thinking I never want to do a panel again. I want it to be like this." </p>

<p>Part of the challenge, so to speak, of Congress hosting a hackathon in the traditional sense, with judging and prizes, lies in procurement rules, said Lira."There are legal issues around challenges or prizes for Congress," he explained. "They're allowed in the executive branch, under DARPA, and now every agency under the <a href="http://en.wikipedia.org/wiki/America_COMPETES_Act">COMPETES Act</a>. We can't choose winners or losers, or give out prizes under procurement rules." </p>

<p>Whatever you call it, at the end of the event, discussion leaders from the groups came back and presented on the ideas and concepts that had been hashed out. You can watch a short video that EngageDC produced for the House Majority Leader's office below: </p>

<p></p>

<p>What came out of this unprecedented event, in other words, won't necessarily be measured in lines of code. It's that <a href="http://techpresident.com/blog-entry/capitol-hills-dec-7-hackathon-means-governments-getting-geekier">Congress got geekier</a>. It's that the <a href="http://sunlightfoundation.com/blog/2011/12/08/in-hackwetrust-the-house-of-representatives-opens-its-doors-to-transparency-through-technology/%22">House is opening its doors to transparency through technology</a>.</p>

<p>Given the focus on Facebook, it's not surprising that social media took center stage in many of the discussions. The idea for it came from a trip to Silicon Valley, where Representative Cantor said he met with Facebook founder Mark Zuckerberg and COO Sheryl Sandberg, and discussed how to make the House more social. After that conversation, Lira and Steve Dwyer, director of online communications and technology for the House Democratic Whip, organized the event. </p>

<p>For a sense of the ideas shared by the working groups, read the <a href="http://storify.com/digiphile/the-first-congressional-hackathon">story of the first congressional "hackathon"</a> on Storify.</p>

<p>"For government, I don't think we could have done anything more purposeful than this as a first meeting," said Lira in our interview. "Next, we'll focus on building this group of people, strengthening the trust, which will prove instrumental when we get into the pure coding space. I have 100% confidence that we could do a programming-only event now and would have attendance." </p>

<h2>A Likeocracy in alpha</h2>

<p>As the Sunlight Foundation's John Wonderlich observed earlier this year, <a href="http://sunlightfoundation.com/blog/2011/04/29/speaker-boehner-and-majority-leader-cantor-on-legislative-data-release/">access to legislative data</a> brings citizens closer to their representatives. </p>

<p>"When developers and programmers have better access to the data of Congress, they can better build the databases and tools that let the rest of us connect with the legislature," he wrote. </p>

<p>If more open legislative data goes online, when we talk about what's trending in Congress, those conversations will be based upon insight into how the nation is reacting to them on social networks, including Facebook, Twitter, and Google+.  </p>

<p>Facebook developers Roddy Lindsay, Tyler Brock, Eric Chaves, Porter Bayne, and Blaise DiPersia coded up a simple proof of concept of what making legislative data might look like.  "<a href="http://likeocracy.org/">LikeOcracy</a>" pulls legislation from a House XML feed and makes it more social. The first version added Facebook's ubiquitous "Like" buttons to bill elements. A second version of the app adds more opportunities for reaction by integrating <a href="http://readrboard.com/">ReadrBoard</a>, which enables users to rate sections or individual lines as "Unnecessary, Problematic, Great Idea or Confusing." You can try it out on three sample bills, including the <a href="http://likeocracy.org/bill.php?id=h3261_ih">Stop Online Piracy Act</a>.</p>

<p>Would "social legislation" in a Facebook app catch on? The growth of civic startups like PopVox, OpenCongress and Votizen suggests that the idea has legs. <em>[Disclosure: Tim O'Reilly was an early angel investor in <a href="http://gov20.govfresh.com/popvox-tries-to-bring-the-voice-of-the-people-into-congress/">PopVox</a>.]</em></p>

<p>Likeocracy doesn't tap into Facebook's Open Graph, but it does hint at what integration might look like in the future. Justin Osofsky, Facebook's director of platform partnerships, described how the interests of constituents could be integrated with congressional data under Facebook's new <a href="http://www.facebook.com/about/timeline">Timeline</a>. Citizens might potentially be able to simply "subscribe" to a bill, much like they can now for any web page, if Facebook's "Subscribe" plug-in was applied to the legislative process. </p>

<h2>Opening bill markup online</h2>

<p>The other app presented at the hackathon came not from the attendees but from the efforts of InSourceCode, a software development firm that's also coded for Congressman Mike Pence and the Republican National Committee. </p>

<p>Rep. Darrell Issa, chairman of the House Committee on Oversight and Government Reform, introduced the beta version of MADISON on Wednesday, a new online tool to <a href="http://techpresident.com/blog-entry/new-tool-crowdsource-legislative-markup-comes-us-house">crowdsource
 legislative markup</a>. The vision is that MADISON will work as a real-time markup engine to let the public comment on bills as they move through the legislative process. "The assumption is that legislation should be open in Congress," said Issa. "It should be posted, interoperable and commented upon." </p>

<p>As Nick Judd reported at techPresident, the first use of MADISON is to host Issa and Sen. Ron Wyden's "OPEN bill," <a href="http://techpresident.com/blog-entry/issawyden-open-bill-debuts-new-legislative-markup-tool">which debuted on the app</a>. Last week, the congressmen released the Online Protection and Enforcement of Digital Trade Act (OPEN) at <a href="http://keepthewebopen.com">Keepthewebopen.com</a>.  The OPEN legislation removes one of the most controversial aspects of <a href="http://radar.oreilly.com/2011/11/sopa-protectip.html">SOPA</a>, using the domain name system for enforcement, and instead places authority with the International Trade Commission to address enforcement of IP rights on websites that are primarily infringing upon copyright. </p>

<p>Issa said that his team had looked at the use of wikis by Rep. John Culberson, who put the healthcare reform bill online in a wiki. "There are some problems with editors who are not transparent to all of us," said Issa. "That's one of the challenges. 
We want to make sure that if you're an editor, you're a known editor." </p>

<p>MADISON includes two levels of authentication: email for simple commenting and a more thorough vetting process for organizations or advocacy groups that wish to comment. "Like most things that are a 1.0 or beta, our assumption is that we'll learn from this," said Issa. "Some members may choose to have an active dialog. Others may choose to have it be part of pre-markup record." </p>

<p>Issa fielded a number of questions on Wednesday, including one from web developer <a href="http://www.linkedin.com/in/brettstubbs">Brett Stubbs</a>: "Will there be open access or an API? What we really want is just data." Issa indicated that future versions might include that. </p>

<p>Jayson Manship, the "<a href="http://www.linkedin.com/in/jaysonmanship">chief nerd</a>" at InSourceCode, said that MADISON was built in four days. According to Manship, the idea came from conversations with Issa and Seamus Kraft, director of digital strategy for the House Committee on Oversight and Government Reform. MADISON is built with PHP and MySQL, and hosted in RackSpace's cloud so it can scale with demand, said Manship. </p>

<p>"It's important to be entrepreneurial," said Lira in our interview. "There are partners throughout institutions that would be willing to do projects of different sizes and scopes. MADISON is something that Issa and Seamus wanted to do.  They took it upon themselves to get the ball rolling. That's the attitude we need." </p>

<p>"We're working to hold the executive accountable to taxpayers," said Kraft last week. "Opening up what we do here in these two halls of Congress is equally important. MADISON is our first shot at it. We're going to need a lot of help to make it better." </p>

<p>Kraft invited the remaining developers present to come to the Rayburn Office Building, where Manship and his team had brought in half a dozen machines, to help get MADISON ready for launch.  While I was there, there were conversations about decisions, plug-ins and ideas about improving the interface or functionality, representing a bona fide collaboration to make the app better. </p>

<p>There's a larger philosophical issue relating to open government that Nick Judd <a href="http://techpresident.com/blog-entry/issawyden-open-bill-debuts-new-legislative-markup-tool">touched upon</a> over at techPresident in a follow-up post on MADISON: </p>

<blockquote><p>The terms for the site warn the user that anything they write on it will become public domain &mdash; but the code itself is proprietary. Meanwhile, OpenCongress' David Moore <a href="https://twitter.com/#!/ppolitics/status/144838336069115904">points out</a> that the code that powers his organization's website, which also allows users to comment on individual provisions of bill text, is open source and has been available for some time. In theory, this means the Oversight staff could have started from that code and built on it instead of beginning from scratch. The code being proprietary means that while people like Moore might be able to make suggestions, they can't just download it, make their own changes and submit them for community review &mdash; which they'd happily do at little or no cost for a project released under an open-source license.</p></blockquote>

<p>As Moore <a href="https://twitter.com/#!/ppolitics/status/144861062968250368">put it</a>, "Get that code on GitHub, we'll do OpenID, fix the design." </p>

<p>When asked about whether the team had considered making MADISON code open source, Manship said  that "he didn't know, although they weren't opposed to it." </p>

<p>While Moore welcomed MADISON, he also <a href="https://twitter.com/#!/ppolitics/status/144838336069115904">observed</a> that Open Congress has had open-source code for bill text commenting for years. </p>

<div align="center">
<blockquote data-in-reply-to="145127676439572480"><p><a href="https://twitter.com/seamuskraft">@seamuskraft</a> <a href="https://twitter.com/mattlira">@mattlira</a> glad to chat, will email. We see first step as liberating full <a href="https://twitter.com/search/%2523opengovdata">#opengovdata</a> (API &amp; bulk) for MADISON &amp; OC &amp; open Web.</p>&mdash; David Moore (@ppolitics) <a href="https://twitter.com/ppolitics/status/145154276354818049" data-datetime="2011-12-09T14:54:10+00:00">December 9, 2011</a></blockquote>

</div>

<p>The decision by Issa's office to fund the creation of an app that was already available as open-source software is one that's worth noting, so I asked Kraft why they didn't fork OpenCongress' code, as Judd suggests. "While there was no specific budget expense for MADISON, it was developed by the Oversight Committee," said Kraft.</p>

<p>"While we like and support OpenCongress' code, it didn't fit the needs for MADISON," Kraft wrote in an emailed statement. </p>

<p>What's next is, so to speak, an "OPEN" question, both in terms of the proposed SOPA alternative and the planned markup of SOPA itself on December 15. The designers of OPEN are actively looking for feedback from the civic software development community, both in terms of what functionality exists now and what could be built in future iterations.</p>

<h2>THOMAS.gov as a platform</h2>

<p>What Moore and long-time open-government advocates like Carl Malamud want to see from Congress is more structural change: </p>

<div align="center">
<blockquote><p>Re: <a href="https://twitter.com/search/%2523hackwetrust">#hackwetrust</a>, while we do seek leg. version control, public bill markup isn't ultimate goal. Exhaustive <a href="https://twitter.com/search/%2523opengovdata">#opengovdata</a> &amp; open API is (2/2)</p>&mdash; David Moore (@ppolitics) <a href="https://twitter.com/ppolitics/status/144838344109600769" data-datetime="2011-12-08T17:58:46+00:00">December 8, 2011</a></blockquote>


<blockquote data-in-reply-to="144968180215971840"><p><a href="https://twitter.com/MattLira">@MattLira</a> <a href="https://twitter.com/DarrellIssa">@DarrellIssa</a> <a href="https://twitter.com/SeamusKraft">@SeamusKraft</a> MADISON is much-welcomed, but PPF's <a href="https://twitter.com/search/%2523opengov">#opengov</a> ultimate goal is open API for <a href="https://twitter.com/THOMASdotgov">@THOMASdotgov</a> -cc <a href="https://twitter.com/digiphile">@digiphile</a>.</p>&mdash; David Moore (@ppolitics) <a href="https://twitter.com/ppolitics/status/145006804827512832" data-datetime="2011-12-09T05:08:11+00:00">December 9, 2011</a></blockquote>

</div>

<p>They're not alone.  Dan Schuman <a href="http://sunlightfoundation.com/blog/2011/12/05/house-holding-wonk-a-thon-on-public-access-to-congressional-info-this-wednesday/">listed</a> many other ways the House has yet to catch up with 21st century technology:</p>

<blockquote><p>We have yet to see <a href="http://sunlightfoundation.com/blog/2011/05/11/sunlight-testimony-bulk-access-to-thomas-and-access-to-crs-products/">bulk access to THOMAS or public access to CRS reports</a>, important <a href="http://sunlightfoundation.com/blog/2011/11/16/a-year-later-little-progress-on-digitizing-legislative-documents/">legislative</a> and <a href="http://sunlightfoundation.com/blog/2011/11/14/only-sunlight-can-lift-congress-ethics-cloud/">ethics</a> documents are still unavailable in digital format, many <a href="http://sunlightfoundation.com/blog/2011/05/10/hearing-on-the-houses-budget-will-not-be-televised-or-webcast/">committee hearings still are not online</a>, and <a href="http://sunlightfoundation.com/policy/documents/house-rules-proposals-112th-congress/">so on</a>.</p></blockquote>

<p>As Schuman highlighted, the Sunlight Foundation has been focused on opening up Congress through technology since the organization was founded. To whit: "There have been several previous collaborative efforts by members of the transparency community to outline how the House of Representatives can be more open and accountable, of which an enduring touchstone is <a href="http://www.theopenhouseproject.com/report/openhouseproject_may8_07.pdf">the Open House Project Report</a>, issued in May 2007," wrote Schuman.</p>

<p>The notion of making <a href="http://gov20.govfresh.com/make-thomas-gov-a-platform-suggests-house-minority-whip-steny-hoyer/">THOMAS.gov into a platform</a> received high-level endorsement from a congressional leader when House Minority Whip Steny Hoyer <a href="http://www.democraticwhip.gov/content/hoyer-remarks-first-congressional-facebook-developers-hackathon">remarked</a> on how technology is affecting Congress, his caucus and open government in the executive branch:</p>

<blockquote><p>For Congress, there is still a lot of work to be done, and we have a duty to make the legislative process as open and accessible as possible. One thing we could do is make <a href="http://thomas.gov">THOMAS.gov</a> &mdash; where people go to research legislation from current and previous Congresses &mdash; easier to use, and accessible by social media. Imagine if a bill in Congress could tweet its own status.</p>

<p>The data available on THOMAS.gov should be expanded and made easily accessible by third-party systems. Once this happens, developers, like many of you here today, could use legislative data in innovative ways. This will usher in new public-private partnerships that will empower new entrepreneurs who will, in turn, yield benefits to the public sector.</p> 

<p>One successful example is how cities have made public transit data accessible so developers can use it in apps and websites. The end result has been commuters saving time every day and seeing more punctual trains and buses as a result of the transparency. Legislative data is far more complex, but the same principles apply. If we make the information available, I am confident that smart people like you will use it in inventive ways.</p></blockquote>

<p>Hoyer's specific citation of the growth of open data in cities and an ecosystem of civic applications based upon it is further confirmation that the <a href="http://radar.oreilly.com/2011/12/gov-2-open-data-civic-apps-npr-ap.html">Gov 2.0 meme is moving into the mainstream</a>.</p>

<p>Making THOMAS.gov into a platform for bulk data would change what's possible for all civic developers. What I really want is "data on everything," Stubbs told me last week. "THOMAS is just a visual viewer of the internal stuff. If we could have all of this, we could do something with it. What I would like is a data broker. I'd like a RESTful API with all of the data that I could just query. That's what the government could learn from Facebook. From my point of view, I just want to pull information and compile it." </p>

<p>If Hoyer and the House leadership would like to see THOMAS.gov act as a platform, several attendees at the hackathon suggested to me that Congress could take a specific action: collaborate with the Senate and send the Library of Congress a letter instructing it to provide <a href="http://sunlightfoundation.com/blog/2011/05/11/sunlight-testimony-bulk-access-to-thomas-and-access-to-crs-products/">bulk legislative data access to THOMAS.gov</a> in structured formats so that developers, designers and citizens around the nation can co-create a better civic experience for everyone.</p>

<p>"The House administration is working on standards called for by the rule and the letter sent earlier this year," said Lira. "We think they will be satisfactory to people. The institutions of the House have been following through since the day they were issued. The first step was issuing an XML feed daily. Next year, there will be a steady series of incremental process improvements. When the House Administrative Committee issues standards, the House Clerk will work on them. "</p>

<p>Despite the abysmal public perception of Congress, genuine institutional changes in the House of Representatives driven by the <a href="http://www.nationaljournal.com/congress/will-the-gop-embrace-innovation-and-transparency--20101125">GOP embracing innovation and transparency</a> are incrementally happening. As Tim O'Reilly observed earlier this year, the current <a href="http://www.cato-at-liberty.org/house-leaderships-transparency-leadership/">leadership of the House on transparency</a> is doing a better job than their predecessors. </p>

<p>In April, Speaker Boehner and Majority Leader Cantor sent a letter to the House Clerk regarding <a href="http://sunlightfoundation.com/blog/2011/04/29/speaker-boehner-and-majority-leader-cantor-on-legislative-data-release/">legislative data release</a>. Then, in September, a <a href="http://sunlightlabs.com/blog/2011/house-revamps-floor-feed/"> live XML feed for the House floor</a> went online. Yes, there's a <em>long</em> way to go on <a href="http://gov20.govfresh.com/cato-study-rating-congress-on-open-legislative-data-gives-low-grades-for-transparency/">open legislative data quality in Congress</a>. That said, there's support for <a href="http://www.huffingtonpost.com/alexander-howard/open-government-data-earn_b_1001505.html">open-government data</a> from both the White House and the House.  </p>

<p>"My personal view is that what's important right now is that the House create the right precedents," said Lira. "If we create or adopt a data standard, it's important that it be the right standard." </p>
	
<p>Even if <a href="http://radar.oreilly.com/2010/09/the-prospects-for-an-open-gover.html">open government is in beta</a>, there needs to be more tolerance for experiments and risks, said Lira. "I made a mistake in attacking <a href="http://whitehouse.gov/wethepeople">We the People</a> as insufficient. I still believe it is, but it's important to realize that the precedent is as important as the product in government. In technology in general, you'll never reach an end.
We The People is a really good precedent, and I look forward to seeing what they do.
They've shown a real commitment, and it's steadily improving." </p>

<h2>A social Congress</h2>

<p>While Sean Parker may predict that <a href="http://techcrunch.com/2011/12/09/sean-parker-social-media-election/">social media will determine the outcome</a> of the 2012 election, governance is another story entirely. Meaningful use of social media by Congress remains challenged by a number of factors, not least an <a href="http://radar.oreilly.com/2011/05/nstic-analysis-identity-privacy.html">online identity</a> ecosystem that has not provided Congress with ideal means to identify constituents online. The reality remains that when it comes to which <a href="http://www.frogloop.com/care2blog/2011/2/3/latest-report-what-messaging-and-channels-influence-congress.html">channels influence Congress</a>, in-person visits and individual emails or phone calls are far more influential with congressional staffers. </p>

<p>As with any set of tools, success shouldn't be measured solely by media reports or press releases but by the outcomes from their use. The hard work of bipartisan compromise between the White House and Congress, to the extent it occurs, might seem unlikely to be publicly visible in 140 characters or less. </p>

<p>"People think it's always an argument in Washington," said Lira in our interview. "Social media can change that. We're seeing a decentralization of audiences that is built around their interests rather than the interests of editors. Imagine when you start streaming every hearing and making information more digestible. All of a sudden, you get these niche audiences. They're not enough to sustain a network, but you'll get enough of an audience to sustain the topic. I believe we will have a more engaged citizenry as a result." </p>

<p>Lira is optimistic. "Technology enables our republic to function better. In ancient Greece, you could only sustain a democracy in the size of city. Transportation technology limited that scope. In the U.S., new technologies enabled global democracy. As we entered the age of mass communication, we lost mass participation. Now with the Internet, we can have people more engaged again." </p>

<p>There may be a 30-year cycle at play here. Lira suggested looking back to radio in the 1920s, television in the 1950s, and cable in the 1980s. "It hasn't changed much since; we're essentially using the same rulebook since the '80s. The changes made in those periods of modernization were unique." </p>

<p>Thirty years on from the introduction of cable news, will the Internet help reinvigorate the founders' vision of a nation of, by and <em>with</em> the people? "I do think that this is a transformational moment," said Lira. "It will be for the next couple of years. When you talk to people &mdash; both Republicans and Democrats &mdash; you sense we're on the cusp of some kind of change, where it's not just communicating about projects but making projects better. Hearings, legislative government and executive government will all be much more participatory a decade from now. "</p>

<p>In that sweep of history, the "People's House" may prove to be a fulcrum of change. "If any place in government is going to do it, it's the House" said Lira. "It's our job to be close to the public in a way that no other part of government is. In the Federalist Papers, that's the role of the House. We have an obligation to lead the way in terms of incorporating technology into real processes. We're not replacing our system of representative government. We're augmenting it with what's now possible, like when the telegraph let people know what the votes were faster." </p>

<div><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-congress-hackathon"><img style="float: left; border: none; padding-right: 10px;" src="http://radar.oreilly.com/2011-strata-ca-promo.png" /></a><a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-congress-hackathon"><strong>Strata 2012</strong></a> &mdash;  The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.<br />
 <br />
<a href="https://en.oreilly.com/strata2012/public/regwith/radar20?cmp=il-radar-st12-congress-hackathon"><strong>Save 20% on registration with the code RADAR20</strong></a></div>

<div>
<a href="http://feeds.feedburner.com/~ff/oreilly/radar/atom?a=M39nyP7Edrs:Q7FnjNLxBvk:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/oreilly/radar/atom?i=M39nyP7Edrs:Q7FnjNLxBvk:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/oreilly/radar/atom?a=M39nyP7Edrs:Q7FnjNLxBvk:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/oreilly/radar/atom?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/oreilly/radar/atom?a=M39nyP7Edrs:Q7FnjNLxBvk:JEwB19i1-c4"><img src="http://feeds.feedburner.com/~ff/oreilly/radar/atom?i=M39nyP7Edrs:Q7FnjNLxBvk:JEwB19i1-c4" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/oreilly/radar/atom?a=M39nyP7Edrs:Q7FnjNLxBvk:7Q72WNTAKBA"><img src="http://feeds.feedburner.com/~ff/oreilly/radar/atom?d=7Q72WNTAKBA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/oreilly/radar/atom?a=M39nyP7Edrs:Q7FnjNLxBvk:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/oreilly/radar/atom?d=qj6IDK7rITs" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/oreilly/radar/atom/~4/M39nyP7Edrs" height="1" width="1" /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=31222&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=31222&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/12/12/can-the-peoples-house-become-a-social-platform-for-the-people/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>on MySQL replication prefetching</title>
		<link>http://dom.as/2011/12/03/replication-prefetching/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=on-mysql-replication-prefetching</link>
		<comments>http://dom.as/2011/12/03/replication-prefetching/#comments</comments>
		<pubDate>Sat, 03 Dec 2011 21:46:46 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[prefetch]]></category>
		<category><![CDATA[Replication]]></category>

		<guid isPermaLink="false">http://dom.as/?p=1536</guid>
		<description><![CDATA[For the impatient ones, or ones that prefer code to narrative, go here. This is long overdue anyway, and Yoshinori already beat me, hehe&#8230;
Our database environment is quite busy &#8211; there&#8217;re millions of row changes a second, millions of I/O operations a second and impact of that can be felt at each shard. Especially, as we also have to replicate to other datacenters, single threaded replication on MySQL becomes a real bottleneck.
We use multiple methods to understand and analyze replication lag composition &#8211; a simple replication thread state sampling via MySQL processlist helps to understand logical workload components (and work in that field yields great results), and pstack/GDB based replication thread sampling shows server internal behavior quite well too (a similar technique was used for accept thread visualisation).
The biggest problem with single replication thread is that it has to read data to execute queries (rather than applying physical page deltas, like PG or just appending to files like HBase, it does logical edits to page data) &#8211; we can observe 95% of process time at that state. As generally there&#8217;s just one outstanding data read per replication thread, other workload hitting the machine will also make replication reads slower.
Generally, the obvious way to deal with slow I/O is issue more outstanding parallel requests, and the only way to do that apart from parallel replication, is to predict what will be needed in future and try to fetch that.
Many many moons ago Paul Tuckfield discussed about the Youtube replication prefetcher &#8211; it would take write statements yet to be executed in relay logs,  convert them to SELECTs and run them before replication thread needs that data. He still says that was one of most satisfying quick hacks :-)
Maatkit (now Percona Toolkit) introduced mk-slave-prefetch (I played with it back in 2008, didn&#8217;t put it into operation at that time though), and eventually that looked like a reasonable option for prefetching statements on our database cluster.
5000 lines of Perl is not the easiest code to work with (or to debug), so the journey was quite bumpy. We got it working in some shape, eventually, but Baron, original author, has something to say about it:
Please don&#8217;t use mk-slave-prefetch on MySQL unless you are Facebook. Or at least don&#8217;t tell your friends, so they won&#8217;t use it.
Anyway, our updates rate would saturate mksp.pl if we used anything fancier on it, so it was a constant balancing act, in which looking at the code was something nobody wanted to do ;-) Still, it was (and is) helping us, so getting rid of it wasn&#8217;t possible either.
At some point in time we decided to make an experiment &#8211; what if we executed statements, then rolled them back &#8211; so I did a quick implementation of that method from scratch in Python &#8211; resulting piece of code was relatively small and fun to experiment with.
There were multiple problems with such approach &#8211; one complication was that queries were grabbing locks for the duration of the statement, and some of those locks would collide with what actual replication thread is doing. Fixing that would require immediate lock wait timeout or transaction kill for prefetcher thread &#8211; so, relatively deep dive into InnoDB. Another problem was internal InnoDB lock contention on rollbacks &#8211; that was expensive operation, and benefits of pages read in were negated by rollback segments lock contention. Fixing that is even more extensive InnoDB work (though probably some people would like their rollbacks to be efficient ;-)
At that moment we came up with the idea, that InnoDB codebase could be instrumented to not do any real work on updates &#8211; just page data in and return to the caller, and if any change accidentally slips in, commits can fail. That looked like a feasible project for the future.
At some point in time we were rolling out a new database tier for one product, which was supposed to have really high volume of changes, but all coming in a uniform format. It took less than hour (as most of the work has been done to create rollback-based one) to come up with a prototype that would efficiently extract literals from uniform statements, then use them for prefetching.
This method worked fine &#8211; at tiny fraction of resources used by mk-slave-prefetch we were preloading secondary indexes and could have relatively extensive parallelism.
Meanwhile, our main database cluster was having more and more uniform query workload, thanks to various libraries, abstractions and middleware &#8211; so a day of work on lowest hanging fruits provided relatively good coverage of the workload.
We didn&#8217;t stop mksp.pl &#8211; it still provided some coverage for various odd cases, which were time-consuming to work on manually.
There were few other problems with the new method &#8211; apparently we were targeting our SELECTs too accurately &#8211; UPDATEs were spending plenty of time in records_in_range. Additionally, optimistic update path was reading in pages that selects wouldn&#8217;t (due to inefficiency in B-Tree locking code). There were some odd reads done for INSERTs.
Also, SELECTs are using indexing less efficiently &#8211; InnoDB can pinpoint entries in secondary indexes by using PK values, yet that ability is not exposed to SQL layer, so prefetching on indexes that don&#8217;t have explicitly defined all fields within them is not that easy.
In theory, all these issues are supposed to be &#8216;fixed&#8217; by fake changes concept. Percona recently implemented it in their releases, and we started experimenting with those changes. It is still not that mature concept, so we will be revisiting how things are or should be done, but for now test results are quite positive (we did some changes to reduce locking and avoid deadlock in REPLACE INTO, among other things).
I still observe I/Os done by main replication thread, so we&#8217;re not in perfect shape yet, but method seems to be working relatively well (at least it definitely speeds up replication). We still have to do lots of testing to qualify this for large-scale production, but this may allow way more write workload on our machines until we get parallel replication all around.
Our code for custom query, fake changes or rollback prefetcher can be checked out from a public repo together with other tools (oops, Bazaar doesn&#8217;t give easy access to subdirectories:
bzr co lp:mysqlatfacebook/tools; cd prefetch
Or browse it online.
P.S. There&#8217;s also Tungsten Replicator for ones who don&#8217;t want to wait for 5.6 parallel replication.]]></description>
			<content:encoded><![CDATA[<p><em>For the impatient ones, or ones that prefer code to narrative, <a href="http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/tools/files/head%3A/prefetch/">go here</a>. This is long overdue anyway, and Yoshinori already <a href="http://yoshinorimatsunobu.blogspot.com/2011/10/making-slave-pre-fetching-work-better.html">beat me</a>, hehe&#8230;</em></p>
<p>Our database environment is quite busy &#8211; there&#8217;re millions of row changes a second, millions of I/O operations a second and impact of that can be felt at each shard. Especially, as we also have to replicate to other datacenters, single threaded replication on MySQL becomes a real bottleneck.</p>
<p>We use multiple methods to understand and analyze replication lag composition &#8211; a simple replication thread state sampling via MySQL processlist helps to understand logical workload components (and work in that field yields great results), and pstack/GDB based replication thread sampling shows server internal behavior quite well too (a similar technique was used for <a title="On connections" href="http://dom.as/2011/08/28/mysql-connection-accept-speed/">accept thread visualisation</a>).</p>
<p>The biggest problem with single replication thread is that it has to read data to execute queries (rather than applying physical page deltas, like PG or just appending to files like HBase, it does logical edits to page data) &#8211; we can observe 95% of process time at that state. As generally there&#8217;s just one outstanding data read per replication thread, other workload hitting the machine will also make replication reads slower.</p>
<p>Generally, the obvious way to deal with slow I/O is issue more outstanding parallel requests, and the only way to do that apart from parallel replication, is to predict what will be needed in future and try to fetch that.</p>
<p>Many many moons ago Paul Tuckfield discussed about the Youtube replication prefetcher &#8211; it would take write statements yet to be executed in relay logs,  convert them to SELECTs and run them before replication thread needs that data. He still says that was one of most satisfying quick hacks :-)</p>
<p>Maatkit (now Percona Toolkit) introduced mk-slave-prefetch (I <a href="http://dom.as/2008/06/07/50-journal-various-issues-replication-prefetching-our-branch/">played with it</a> back in 2008, didn&#8217;t put it into operation at that time though), and eventually that looked like a reasonable option for prefetching statements on our database cluster.</p>
<p>5000 lines of Perl is not the easiest code to work with (or to debug), so the journey was quite bumpy. We got it working in some shape, eventually, but Baron, original author, <a href="http://twitter.com/#!/xaprb/statuses/128876485472829440">has something to say</a> about it:</p>
<p><em>Please don&#8217;t use mk-slave-prefetch on MySQL unless you are Facebook. Or at least don&#8217;t tell your friends, so they won&#8217;t use it.</em></p>
<p>Anyway, our updates rate would saturate mksp.pl if we used anything fancier on it, so it was a constant balancing act, in which looking at the code was something nobody wanted to do ;-) Still, it was (and is) helping us, so getting rid of it wasn&#8217;t possible either.</p>
<p>At some point in time we decided to make an experiment &#8211; what if we executed statements, then rolled them back &#8211; so I did a quick implementation of that method from scratch in Python &#8211; resulting piece of code was relatively small and fun to experiment with.</p>
<p>There were multiple problems with such approach &#8211; one complication was that queries were grabbing locks for the duration of the statement, and some of those locks would collide with what actual replication thread is doing. Fixing that would require immediate lock wait timeout or transaction kill for prefetcher thread &#8211; so, relatively deep dive into InnoDB. Another problem was internal InnoDB lock contention on rollbacks &#8211; that was expensive operation, and benefits of pages read in were negated by rollback segments lock contention. Fixing that is even more extensive InnoDB work (though probably some people would like their rollbacks to be efficient ;-)</p>
<p>At that moment we came up with the idea, that InnoDB codebase could be instrumented to not do any real work on updates &#8211; just page data in and return to the caller, and if any change accidentally slips in, commits can fail. That looked like a feasible project for the future.</p>
<p>At some point in time we were rolling out a new database tier for one product, which was supposed to have really high volume of changes, but all coming in a uniform format. It took less than hour (as most of the work has been done to create rollback-based one) to come up with a prototype that would efficiently extract literals from uniform statements, then use them for prefetching.</p>
<p>This method worked fine &#8211; at tiny fraction of resources used by mk-slave-prefetch we were preloading secondary indexes and could have relatively extensive parallelism.</p>
<p>Meanwhile, our main database cluster was having more and more uniform query workload, thanks to various libraries, abstractions and middleware &#8211; so a day of work on lowest hanging fruits provided relatively good coverage of the workload.</p>
<p>We didn&#8217;t stop mksp.pl &#8211; it still provided some coverage for various odd cases, which were time-consuming to work on manually.</p>
<p>There were few other problems with the new method &#8211; apparently we were targeting our SELECTs too accurately &#8211; UPDATEs were spending plenty of time in <a href="http://dom.as/2011/01/27/a-case-for-force-index/">records_in_range</a>. Additionally, optimistic update path was reading in pages that selects wouldn&#8217;t (due to <a href="http://bugs.mysql.com/bug.php?id=61736">inefficiency</a> in B-Tree locking code). There were some odd reads done for INSERTs.</p>
<p>Also, SELECTs are using indexing less efficiently &#8211; InnoDB can pinpoint entries in secondary indexes by using PK values, yet that ability is <a href="http://bugs.mysql.com/bug.php?id=62025">not exposed</a> to SQL layer, so prefetching on indexes that don&#8217;t have explicitly defined all fields within them is not that easy.</p>
<p>In theory, all these issues are supposed to be &#8216;fixed&#8217; by fake changes concept. Percona recently <a href="http://www.percona.com/doc/percona-server/5.5/management/innodb_fake_changes.html">implemented it in their releases</a>, and we started experimenting with those changes. It is still not that mature concept, so we will be revisiting how things are or should be done, but for now test results are quite positive (we did some changes to reduce locking and avoid deadlock in REPLACE INTO, among other things).</p>
<p>I still observe I/Os done by main replication thread, so we&#8217;re not in perfect shape yet, but method seems to be working relatively well (at least it definitely speeds up replication). We still have to do lots of testing to qualify this for large-scale production, but this may allow way more write workload on our machines until we get parallel replication all around.</p>
<p>Our code for custom query, fake changes or rollback prefetcher can be checked out from a public repo together with other tools (oops, Bazaar doesn&#8217;t give easy access to subdirectories:</p>
<pre>bzr co lp:mysqlatfacebook/tools; cd prefetch</pre>
<p>Or <a href="http://bazaar.launchpad.net/~mysqlatfacebook/mysqlatfacebook/tools/files/head%3A/prefetch/">browse it online</a>.</p>
<p>P.S. There&#8217;s also <a href="http://www.continuent.com/solutions/tungsten-replicator">Tungsten Replicator</a> for ones who don&#8217;t want to wait for 5.6 parallel replication.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/domasmituzas.wordpress.com/1536/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/domasmituzas.wordpress.com/1536/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/domasmituzas.wordpress.com/1536/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=dom.as&amp;blog=190075&amp;post=1536&amp;subd=domasmituzas&amp;ref=&amp;feed=1" width="1" height="1" /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=31108&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=31108&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/12/04/on-mysql-replication-prefetching/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On connections</title>
		<link>http://dom.as/2011/08/28/mysql-connection-accept-speed/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=on-connections</link>
		<comments>http://dom.as/2011/08/28/mysql-connection-accept-speed/#comments</comments>
		<pubDate>Sun, 28 Aug 2011 21:51:28 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[profiling]]></category>

		<guid isPermaLink="false">http://dom.as/?p=897</guid>
		<description><![CDATA[MySQL is needlessly slow at accepting new connections. People usually work around that by having various sorts of connection pools, but there’s always a scale at which connection pools are not feasible. Sometimes connection avalanches come unexpected...]]></description>
			<content:encoded><![CDATA[<p>MySQL is needlessly slow at accepting new connections. People usually work around that by having various sorts of connection pools, but there’s always a scale at which connection pools are not feasible. Sometimes connection avalanches come unexpected, and even if MySQL would have no trouble dealing with queries, it will have problems letting clients in. Something has to be done about it.</p>
<p>Lots of these problems have been low hanging fruits for years &#8211; it ‘was not detected’ by benchmarks because everyone who benchmarks MySQL would know that persistent connections are much faster and therefore wouldn’t look at connection speeds anymore. </p>
<p>Usually people attribute most of slowness to the LOCK_thread_count mutex &#8211; they are only partially right. This mutex does not just handle the counter of active running connections, but pretty much every operation that deals with increase or decrease of threads (thread cache, active thread lists, etc) has to hold it for a while. </p>
<p>Also, it is common wisdom to use <a href="http://dev.mysql.com/doc/refman/5.5/en/connection-threads.html">thread cache</a>, but what people quite often miss is that thread cache is something that was created back when OS threads were extremely expensive to create, and all it does is caching pthreads. It does not do any of MySQL specific thread caching magic &#8211; everything gets completely reinitialized for each incoming structure. </p>
<p>I decided to attack this problem based on very simple hypothesis &#8211; whatever ‘accept thread’ is doing, is bottleneck for whole process. It is very simple to analyze everything from this perspective (and I had some success looking at replication threads from this perspective). </p>
<p>All we need is gdb and two loops &#8211; gdb attaches to accept thread, one loop does ‘breakpoint; continue’, another sends signals at a certain sampling rate (I picked 10Hz in order to avoid profiling bias). I <a href="https://www.facebook.com/poormansprofiler/posts/147734715313155">posted</a> those scripts on <a href="https://www.facebook.com/poormansprofiler">PMP page</a>. After a lunch break I had 50k stacks (long lunch ;-) that I fed into graphviz for full data visualisation and could look at individually:</p>
<p><a href="http://dom.as/tech/connects-5.5.png"><img src="http://dom.as/tech/connects-5.5-thumb.png" /></a></p>
<p>A picture is worth thousand words (well, is easier than looking at thousands of lines in stack aggregations), and I immediately noticed few things worth looking at:</p>
<ul>
<li>Initializing THD (MySQL thread) structure is CPU-heavy task that resides in choke-point thread</li>
<li>There is way too much time spent in syscalls, whatever they do</li>
<li>Too much memory allocation done by the master thread</li>
<li>There’s mutex contention on thread cache waking up worker threads</li>
<li>There’s needless mutex contention in few other places</li>
</ul>
<p>I didn’t want to look at mutex contention issues first so I ended up with something as simple as looking at syscall costs. </p>
<ul>
<li>15% was going into actual accept()</li>
<li>8.5% was going into poll()</li>
<li>8% went into fcntl()</li>
<li>7% went into setsockopt()</li>
<li>1.2% went into getsockname()</li>
</ul>
<p>An strace on mysqld gives a picture that explains quite a bit:<br />
<code><br />
poll([{fd=12, ...}, {fd=13, ...}], 2, -1) = 1<br />
fcntl(12, F_GETFL)  = 0x2 (flags O_RDWR)<br />
fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK)   = 0<br />
accept(12, {... sin_port=htons(59183), ...) = 32<br />
fcntl(12, F_SETFL, O_RDWR)<br />
getsockname(32, {... sin_port=htons(3306), ...) = 0<br />
fcntl(32, F_SETFL, O_RDONLY) fcntl(32, F_GETFL) = 0x2 (flags O_RDWR)<br />
setsockopt(32, SOL_SOCKET, SO_RCVTIMEO, ...)<br />
setsockopt(32, SOL_SOCKET, SO_SNDTIMEO, ...)<br />
fcntl(32, F_SETFL, O_RDWR|O_NONBLOCK)<br />
setsockopt(32, SOL_IP, IP_TOS, [8], 4)<br />
setsockopt(32, SOL_TCP, TCP_NODELAY, [1], 4)<br />
</code></p>
<p>I’ll skip walking through the code, but essentially what it does here is (12 is accept socket, 32 is connection socket):</p>
<ul>
<li>poll() checks whether there are pending connections. If server is busy, trying to accept first, poll on failure is a better approach. There are side effects with that idea though &#8211; other sockets may starve a bit, but it is solvable by injecting occasional poll.</li>
<li>What happens next is a bit sad. Instead of storing per-socket flags (nobody is touching that for now anyway), it gets the socket flags, figures out it is a blocking socket, sets it to nonblocking mode, accepts the connection, sets it back to blocking mode. Just setting to nonblocking at the start and using it forever that way is much cheaper and constipates way less.</li>
<li>accept() itself can be scaled only by having parallel accept() threads. Maybe most of this post would be not necessary if there were multiple accept threads, but I’m not eager to go into that kind of refactoring for now.</li>
<li>getsockname() is used just to verify if socket is correct (probably catching EINVAL later seems to be too complicated), it is a very pessimistic code path for a case that nearly never happens (it probably was added for some random Unix back from nineties)</li>
<li>Next fcntl “get flags” call is quite unnecessary &#8211; this is a fresh socket and one shouldn’t expect anything special within it. Later non-blocking mode is set, so that overrides whatever was obtained here.</li>
<li>Three out of four setsockopt()s are necessary evil (one turns of <a href='http://en.wikipedia.org/wiki/Nagle's_algorithm'>Nagle’s algorithm</a>, two other set socket timeouts), so they have to be done before network I/O is done on the socket. Fourth setsockopt() is usually completely useless &#8211; not every network observes <a href="http://en.wikipedia.org/wiki/Type_of_Service">IP_TOS header</a>, and one has to talk to network administrator first about decent values. I’d say it can be optional parameter (yay, more tuning options).</li>
</ul>
<p>Pretty much every connection socket operation can be done later, in a worker thread, without consuming expensive accept thread time, and pretty much every syscall except accept() can be removed from a busy accept thread(), which is what I did in my testing build.</p>
<p>Once I got rid of syscalls I started looking at other low hanging fruits. The most obvious one was sprintf() called inside vio_new(). Though it accounted only for 4% of thread time, the uselessness of it was depressing. Here it is:</p>
<pre>
sprintf(vio-&gt;desc,
   (vio-&gt;type == VIO_TYPE_SOCKET ? "socket (%d)" : "TCP/IP (%d)"),
   vio-&gt;sd);
</pre>
<p>It formats a string that is not used at all by production builds (only few DBUG messages are calling vio_description()). Though I removed this code in non-debug build, as I was moving over network initialization to worker threads, whole my_net_init() and vio() ended up outside of accept thread anyway ;-)</p>
<p>The overall thread cache design is centered around LOCK_thread_count &#8211; lock is held while signaling threads, and threads that wake up need the lock too &#8211; so there’s lots of overhead involved in the coordination &#8211; 13% of time is spent just to pass the task to a worker thread.</p>
<p>Allowing multiple threads to wake up and multiple entries to be placed into thread cache before it is all drained (more of an InnoDB concurrency-queue with FLIFO approach) could be somewhat better &#8211; so would be worker threads accepting connections directly (I already said that, I guess). There’s simply too much time wasted waking up and sending threads to sleep, and quite some of that time is on a choke point. </p>
<p>THD initializations are somewhat simpler, as they don’t include SMP madness. </p>
<p>There’re some low hanging fruits of course there as well. For example THD initializer calls sql_rnd_with_mutex(), which locks thread count mutex. Simplest fix could be using another mutex, though lockless random function or on-demand variable initialization would help too.</p>
<p>Some initializers there are quite expensive too &#8211; e.g. Warning_info class could initialize dynamic storage only when actually used, and not at THD initialization chokepoint. THD::init can be moved to a worker thread, and lots of THD initialization could be moved over to it. </p>
<p>Quite a lot of time (12%) is spent on malloc() &#8211; and lots of that is for allocating lots of various fixed-size structures &#8211; slab allocator (or just more efficient malloc implementation) could cut on CPU time there. Of course, more drastic alternative is not dealing with THD at all during accept phase &#8211; one can pass stub structure to build upon later, or (oh, am I writing this again) moving accept() part to individual workers. </p>
<p>So far I tested just few optimizations &#8211; moved over vio/net initialization to worker threads, reduced number of syscalls, added a new mutex for rand initialization, and that alone got me <b>additional 50% increase</b> in connection accepts. Think how much more one could get from fixing this problem properly ;-)</p>
<p><b>TL;DR:</b> MySQL sucks at accepting new connections, but there’re lots of low hanging fruit there. Ask your MySQL provider for a fix.</p>
<p>MySQL bug entries:</p>
<ul>
<li>rand(): <a href="http://bugs.mysql.com/bug.php?id=62282">#62282</a></li>
<li>fcntl on accept(): <a href="http://bugs.mysql.com/bug.php?id=62283">#62283</a></li>
<li>network init costs: <a href="http://bugs.mysql.com/bug.php?id=62284">#62284</a></li>
<li>vio sprintf(): <a href="http://bugs.mysql.com/bug.php?id=62285">#62285</a></li>
<li>poll() vs accept(): <a href="http://bugs.mysql.com/bug.php?id=62286">#62286</a></li>
<li>thread cache performance: <a href="http://bugs.mysql.com/bug.php?id=62287">#62287</a></li>
<li>THD initialization: <a href="http://bugs.mysql.com/bug.php?id=62288">#62288</a></li>
</ul><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29820&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29820&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/08/29/on-connections/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Facebook Open Graph Meta WordPress Plugin</title>
		<link>http://feedproxy.google.com/~r/Thinkdiffnet/~3/x6QZ0a8m9S8/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=facebook-open-graph-meta-wordpress-plugin</link>
		<comments>http://feedproxy.google.com/~r/Thinkdiffnet/~3/x6QZ0a8m9S8/#comments</comments>
		<pubDate>Fri, 26 Aug 2011 18:21:40 +0000</pubDate>
		<dc:creator>Mahmud Ahsan</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[facebook like]]></category>
		<category><![CDATA[facebook open graph]]></category>
		<category><![CDATA[facebook share]]></category>
		<category><![CDATA[og]]></category>
		<category><![CDATA[open graph meta]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://thinkdiff.net/?p=2379</guid>
		<description><![CDATA[Have you ever noticed that, if you implemented Facebook like or Facebook Share in wordpress blog and when people click Like the shared post on user wall looks not good most of the time. This is because you didn&#8217;t implement the facebook open graph meta data in your blog post or page. As a result when facebook parse the link sometimes they can&#8217;t parse it properly that you expected.
To solve the situation you&#8217;ve to add open graph meta data in your site. Some days ago I manually added this in my blog&#8217;s theme, but later I decided to make a wordpress plugin so that it become easier to use and share with others.

My plugin features:
1. Automatically set facebook open graph meta data in your wordpress site
2. Open graph meta data will be dynamic based on post or page
3. In the admin panel you can provide AppId
4. In the admin panel you can provide facebook user id (admins)
5. You can set a default image that will be used when there is no image associated with a post or page
6. In the plugin admin page, you&#8217;ll see detailed specification to setup facebook app.
7. The plugin is released under New BSD License.

Plugin setting page

You will see detailed tutorial about how to set facebook application and retrieve the information in the plugin setting page.
If you activate this plugin you&#8217;ll see when people share or like your wordpress post or page, that will show nicely on their facebook walls. Thus user&#8217;s friends will be inspired to click the link.

After successfully installed the plugin and changed the html tag by yourself, if you view the source code of your blog page/post from browser you&#8217;ll notice the following things:
HTML tag is changed

&#60;html xmlns=&#34;http://www.w3.org/1999/xhtml&#34; xmlns:fb=&#34;http://www.facebook.com/2008/fbml&#34; xmlns:og=&#34;http://opengraphprotocol.org/schema/&#34; dir=&#34;ltr&#34; lang=&#34;en-US&#34;&#62;

Open graph meta data added

&#60;!-- Facebook Open Graph --&#62;
&#60;meta property=&#34;fb:app_id&#34; content=&#34;XXXXXXXXXXX&#34; /&#62;
&#60;meta property=&#34;fb:admins&#34; content=&#34;YYYYYYYYYYY&#34; /&#62;
&#60;meta property=&#34;og:url&#34; content=&#34;http://thinkdiff.net/facebook/sharekit-must-have-ios-app-share-library/&#34;/&#62;
&#60;meta property=&#34;og:site_name&#34; content=&#34;Thinkdiff.net&#34; /&#62;
&#60;meta property=&#34;og:description&#34; content=&#34;geeky stuff, facebook, twitter, linkedin, php, mysql, web development, tips and more&#34; /&#62;

&#60;meta property=&#34;og:type&#34; content=&#34;website&#34; /&#62;
&#60;meta property=&#34;og:image&#34; content=&#34;http://thinkdiff.net/image/thinkdiff.net_splash.jpg&#34; /&#62;

Where XXX&#8230; and YYY&#8230; should be your app id and admin user id.
Hope this helps]]></description>
			<content:encoded><![CDATA[<p><a href="http://c2842062.r62.cf0.rackcdn.com/2011/08/fb_open_graph.jpg" rel="lightbox[2379]"><img class="alignleft size-full wp-image-2382" title="fb_open_graph" src="http://c2842062.r62.cf0.rackcdn.com/2011/08/fb_open_graph.jpg" alt="" width="172" height="172" /></a>Have you ever noticed that, if you implemented <strong>Facebook like</strong> or <strong>Facebook Share</strong> in wordpress blog and when people click <strong>Like</strong> the shared post on user wall looks not good most of the time. This is because you didn&#8217;t implement the <a href="http://developers.facebook.com/docs/opengraph/" >facebook open graph meta</a> data in your blog post or page. As a result when facebook parse the link sometimes they can&#8217;t parse it properly that you expected.</p>
<p>To solve the situation you&#8217;ve to add open graph meta data in your site. Some days ago I manually added this in my blog&#8217;s theme, but later I decided to make a wordpress plugin so that it become easier to use and share with others.</p>
<p><span></span></p>
<p><strong>My plugin features:</strong></p>
<p>1. Automatically set facebook open graph meta data in your wordpress site<br />
2. Open graph meta data will be dynamic based on post or page<br />
3. In the admin panel you can provide AppId<br />
4. In the admin panel you can provide facebook user id (admins)<br />
5. You can set a default image that will be used when there is no image associated with a post or page<br />
6. In the plugin admin page, you&#8217;ll see detailed specification to setup facebook app.<br />
7. The plugin is released under <a href="http://thinkdiff.net/License.txt" >New BSD License.</a></p>
<p><a title="Download" href="https://github.com/mahmudahsan/Facebook-Open-Graph-Meta" ><img class="aligncenter size-full wp-image-2390" title="btn_download_wordpress" src="http://c2842062.r62.cf0.rackcdn.com/2011/08/btn_download_wordpress.png" alt="" width="160" height="69" /></a></p>
<p><strong>Plugin setting page</strong></p>
<p><a href="http://c2842062.r62.cf0.rackcdn.com/2011/08/screen1.jpg" rel="lightbox[2379]"><img class="aligncenter size-full wp-image-2389" title="Setting Page" src="http://c2842062.r62.cf0.rackcdn.com/2011/08/screen1.jpg" alt="" width="577" height="303" /></a></p>
<p><em>You will see detailed tutorial about how to set facebook application and retrieve the information in the plugin setting page.</em></p>
<p>If you activate this plugin you&#8217;ll see when people share or like your wordpress post or page, that will show nicely on their facebook walls. Thus user&#8217;s friends will be inspired to click the link.</p>
<p><a href="http://c2842062.r62.cf0.rackcdn.com/2011/08/590x300.jpg" rel="lightbox[2379]"><img class="aligncenter size-full wp-image-2383" title="facebook wall post" src="http://c2842062.r62.cf0.rackcdn.com/2011/08/590x300.jpg" alt="" width="575" height="292" /></a></p>
<p>After successfully installed the plugin and changed the html tag by yourself, if you view the source code of your blog page/post from browser you&#8217;ll notice the following things:<br />
HTML tag is changed</p>
<pre>
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; xmlns:fb=&quot;http://www.facebook.com/2008/fbml&quot; xmlns:og=&quot;http://opengraphprotocol.org/schema/&quot; dir=&quot;ltr&quot; lang=&quot;en-US&quot;&gt;
</pre>
<p>Open graph meta data added</p>
<pre>
&lt;!-- Facebook Open Graph --&gt;
&lt;meta property=&quot;fb:app_id&quot; content=&quot;XXXXXXXXXXX&quot; /&gt;
&lt;meta property=&quot;fb:admins&quot; content=&quot;YYYYYYYYYYY&quot; /&gt;
&lt;meta property=&quot;og:url&quot; content=&quot;http://thinkdiff.net/facebook/sharekit-must-have-ios-app-share-library/&quot;/&gt;
&lt;meta property=&quot;og:site_name&quot; content=&quot;Thinkdiff.net&quot; /&gt;
&lt;meta property=&quot;og:description&quot; content=&quot;geeky stuff, facebook, twitter, linkedin, php, mysql, web development, tips and more&quot; /&gt;

&lt;meta property=&quot;og:type&quot; content=&quot;website&quot; /&gt;
&lt;meta property=&quot;og:image&quot; content=&quot;http://thinkdiff.net/image/thinkdiff.net_splash.jpg&quot; /&gt;
</pre>
<p>Where XXX&#8230; and YYY&#8230; should be your app id and admin user id.</p>
<p>Hope this helps <img src="http://thinkdiff.net/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /> </p>

<p><a href="http://feedads.g.doubleclick.net/~a/MzzBFEfkns8iaRvIOZ_G4r8UbGA/0/da"><img src="http://feedads.g.doubleclick.net/~a/MzzBFEfkns8iaRvIOZ_G4r8UbGA/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/MzzBFEfkns8iaRvIOZ_G4r8UbGA/1/da"><img src="http://feedads.g.doubleclick.net/~a/MzzBFEfkns8iaRvIOZ_G4r8UbGA/1/di" border="0" ismap="true"></img></a></p><div>
<a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?i=x6QZ0a8m9S8:KjkOrZ4JV1E:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?i=x6QZ0a8m9S8:KjkOrZ4JV1E:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?i=x6QZ0a8m9S8:KjkOrZ4JV1E:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?i=x6QZ0a8m9S8:KjkOrZ4JV1E:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Thinkdiffnet/~4/x6QZ0a8m9S8" height="1" width="1" /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29810&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29810&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/08/26/facebook-open-graph-meta-wordpress-plugin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Facebook Open Graph Meta WordPress Plugin</title>
		<link>http://feedproxy.google.com/~r/Thinkdiffnet/~3/x6QZ0a8m9S8/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=facebook-open-graph-meta-wordpress-plugin</link>
		<comments>http://feedproxy.google.com/~r/Thinkdiffnet/~3/x6QZ0a8m9S8/#comments</comments>
		<pubDate>Fri, 26 Aug 2011 18:21:40 +0000</pubDate>
		<dc:creator>Mahmud Ahsan</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[facebook like]]></category>
		<category><![CDATA[facebook open graph]]></category>
		<category><![CDATA[facebook share]]></category>
		<category><![CDATA[og]]></category>
		<category><![CDATA[open graph meta]]></category>
		<category><![CDATA[Wordpress]]></category>

		<guid isPermaLink="false">http://thinkdiff.net/?p=2379</guid>
		<description><![CDATA[Have you ever noticed that, if you implemented Facebook like or Facebook Share in wordpress blog and when people click Like the shared post on user wall looks not good most of the time. This is because you didn&#8217;t implement the facebook open graph meta data in your blog post or page. As a result when facebook parse the link sometimes they can&#8217;t parse it properly that you expected.
To solve the situation you&#8217;ve to add open graph meta data in your site. Some days ago I manually added this in my blog&#8217;s theme, but later I decided to make a wordpress plugin so that it become easier to use and share with others.

My plugin features:
1. Automatically set facebook open graph meta data in your wordpress site
2. Open graph meta data will be dynamic based on post or page
3. In the admin panel you can provide AppId
4. In the admin panel you can provide facebook user id (admins)
5. You can set a default image that will be used when there is no image associated with a post or page
6. In the plugin admin page, you&#8217;ll see detailed specification to setup facebook app.
7. The plugin is released under New BSD License.

Plugin setting page

You will see detailed tutorial about how to set facebook application and retrieve the information in the plugin setting page.
If you activate this plugin you&#8217;ll see when people share or like your wordpress post or page, that will show nicely on their facebook walls. Thus user&#8217;s friends will be inspired to click the link.

After successfully installed the plugin and changed the html tag by yourself, if you view the source code of your blog page/post from browser you&#8217;ll notice the following things:
HTML tag is changed

&#60;html xmlns=&#34;http://www.w3.org/1999/xhtml&#34; xmlns:fb=&#34;http://www.facebook.com/2008/fbml&#34; xmlns:og=&#34;http://opengraphprotocol.org/schema/&#34; dir=&#34;ltr&#34; lang=&#34;en-US&#34;&#62;

Open graph meta data added

&#60;!-- Facebook Open Graph --&#62;
&#60;meta property=&#34;fb:app_id&#34; content=&#34;XXXXXXXXXXX&#34; /&#62;
&#60;meta property=&#34;fb:admins&#34; content=&#34;YYYYYYYYYYY&#34; /&#62;
&#60;meta property=&#34;og:url&#34; content=&#34;http://thinkdiff.net/facebook/sharekit-must-have-ios-app-share-library/&#34;/&#62;
&#60;meta property=&#34;og:site_name&#34; content=&#34;Thinkdiff.net&#34; /&#62;
&#60;meta property=&#34;og:description&#34; content=&#34;geeky stuff, facebook, twitter, linkedin, php, mysql, web development, tips and more&#34; /&#62;

&#60;meta property=&#34;og:type&#34; content=&#34;website&#34; /&#62;
&#60;meta property=&#34;og:image&#34; content=&#34;http://thinkdiff.net/image/thinkdiff.net_splash.jpg&#34; /&#62;

Where XXX&#8230; and YYY&#8230; should be your app id and admin user id.
Hope this helps]]></description>
			<content:encoded><![CDATA[<p><a href="http://c2842062.r62.cf0.rackcdn.com/2011/08/fb_open_graph.jpg" rel="lightbox[2379]"><img class="alignleft size-full wp-image-2382" title="fb_open_graph" src="http://c2842062.r62.cf0.rackcdn.com/2011/08/fb_open_graph.jpg" alt="" width="172" height="172" /></a>Have you ever noticed that, if you implemented <strong>Facebook like</strong> or <strong>Facebook Share</strong> in wordpress blog and when people click <strong>Like</strong> the shared post on user wall looks not good most of the time. This is because you didn&#8217;t implement the <a href="http://developers.facebook.com/docs/opengraph/" >facebook open graph meta</a> data in your blog post or page. As a result when facebook parse the link sometimes they can&#8217;t parse it properly that you expected.</p>
<p>To solve the situation you&#8217;ve to add open graph meta data in your site. Some days ago I manually added this in my blog&#8217;s theme, but later I decided to make a wordpress plugin so that it become easier to use and share with others.</p>
<p><span></span></p>
<p><strong>My plugin features:</strong></p>
<p>1. Automatically set facebook open graph meta data in your wordpress site<br />
2. Open graph meta data will be dynamic based on post or page<br />
3. In the admin panel you can provide AppId<br />
4. In the admin panel you can provide facebook user id (admins)<br />
5. You can set a default image that will be used when there is no image associated with a post or page<br />
6. In the plugin admin page, you&#8217;ll see detailed specification to setup facebook app.<br />
7. The plugin is released under <a href="http://thinkdiff.net/License.txt" >New BSD License.</a></p>
<p><a title="Download" href="https://github.com/mahmudahsan/Facebook-Open-Graph-Meta" ><img class="aligncenter size-full wp-image-2390" title="btn_download_wordpress" src="http://c2842062.r62.cf0.rackcdn.com/2011/08/btn_download_wordpress.png" alt="" width="160" height="69" /></a></p>
<p><strong>Plugin setting page</strong></p>
<p><a href="http://c2842062.r62.cf0.rackcdn.com/2011/08/screen1.jpg" rel="lightbox[2379]"><img class="aligncenter size-full wp-image-2389" title="Setting Page" src="http://c2842062.r62.cf0.rackcdn.com/2011/08/screen1.jpg" alt="" width="577" height="303" /></a></p>
<p><em>You will see detailed tutorial about how to set facebook application and retrieve the information in the plugin setting page.</em></p>
<p>If you activate this plugin you&#8217;ll see when people share or like your wordpress post or page, that will show nicely on their facebook walls. Thus user&#8217;s friends will be inspired to click the link.</p>
<p><a href="http://c2842062.r62.cf0.rackcdn.com/2011/08/590x300.jpg" rel="lightbox[2379]"><img class="aligncenter size-full wp-image-2383" title="facebook wall post" src="http://c2842062.r62.cf0.rackcdn.com/2011/08/590x300.jpg" alt="" width="575" height="292" /></a></p>
<p>After successfully installed the plugin and changed the html tag by yourself, if you view the source code of your blog page/post from browser you&#8217;ll notice the following things:<br />
HTML tag is changed</p>
<pre>
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; xmlns:fb=&quot;http://www.facebook.com/2008/fbml&quot; xmlns:og=&quot;http://opengraphprotocol.org/schema/&quot; dir=&quot;ltr&quot; lang=&quot;en-US&quot;&gt;
</pre>
<p>Open graph meta data added</p>
<pre>
&lt;!-- Facebook Open Graph --&gt;
&lt;meta property=&quot;fb:app_id&quot; content=&quot;XXXXXXXXXXX&quot; /&gt;
&lt;meta property=&quot;fb:admins&quot; content=&quot;YYYYYYYYYYY&quot; /&gt;
&lt;meta property=&quot;og:url&quot; content=&quot;http://thinkdiff.net/facebook/sharekit-must-have-ios-app-share-library/&quot;/&gt;
&lt;meta property=&quot;og:site_name&quot; content=&quot;Thinkdiff.net&quot; /&gt;
&lt;meta property=&quot;og:description&quot; content=&quot;geeky stuff, facebook, twitter, linkedin, php, mysql, web development, tips and more&quot; /&gt;

&lt;meta property=&quot;og:type&quot; content=&quot;website&quot; /&gt;
&lt;meta property=&quot;og:image&quot; content=&quot;http://thinkdiff.net/image/thinkdiff.net_splash.jpg&quot; /&gt;
</pre>
<p>Where XXX&#8230; and YYY&#8230; should be your app id and admin user id.</p>
<p>Hope this helps <img src="http://thinkdiff.net/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /> </p>

<p><a href="http://feedads.g.doubleclick.net/~a/MzzBFEfkns8iaRvIOZ_G4r8UbGA/0/da"><img src="http://feedads.g.doubleclick.net/~a/MzzBFEfkns8iaRvIOZ_G4r8UbGA/0/di" border="0" ismap="true"></img></a><br/>
<a href="http://feedads.g.doubleclick.net/~a/MzzBFEfkns8iaRvIOZ_G4r8UbGA/1/da"><img src="http://feedads.g.doubleclick.net/~a/MzzBFEfkns8iaRvIOZ_G4r8UbGA/1/di" border="0" ismap="true"></img></a></p><div>
<a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:yIl2AUoC8zA"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?d=yIl2AUoC8zA" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:D7DqB2pKExk"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?i=x6QZ0a8m9S8:KjkOrZ4JV1E:D7DqB2pKExk" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:F7zBnMyn0Lo"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?i=x6QZ0a8m9S8:KjkOrZ4JV1E:F7zBnMyn0Lo" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:V_sGLiPBpWU"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?i=x6QZ0a8m9S8:KjkOrZ4JV1E:V_sGLiPBpWU" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:qj6IDK7rITs"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?d=qj6IDK7rITs" border="0"></img></a> <a href="http://feeds.feedburner.com/~ff/Thinkdiffnet?a=x6QZ0a8m9S8:KjkOrZ4JV1E:gIN9vFwOqvQ"><img src="http://feeds.feedburner.com/~ff/Thinkdiffnet?i=x6QZ0a8m9S8:KjkOrZ4JV1E:gIN9vFwOqvQ" border="0"></img></a>
</div><img src="http://feeds.feedburner.com/~r/Thinkdiffnet/~4/x6QZ0a8m9S8" height="1" width="1" /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29810&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29810&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/08/26/facebook-open-graph-meta-wordpress-plugin/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Facebook praises Tungsten</title>
		<link>http://continuent-tungsten.blogspot.com/2011/08/in-his-latest-blog-post-robert-hodges.html?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=facebook-praises-tungsten</link>
		<comments>http://continuent-tungsten.blogspot.com/2011/08/in-his-latest-blog-post-robert-hodges.html#comments</comments>
		<pubDate>Mon, 15 Aug 2011 21:15:24 +0000</pubDate>
		<dc:creator>Eero Teerikorpi</dc:creator>
				<category><![CDATA[Continuent Tungsten]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://planetmysql.ru/?guid=378f46e277ba92574fea39801abb604b</guid>
		<description><![CDATA[&#34;Tungsten has made huge strides this year in performance and usability.&#34;
http://www.facebook.com/MySQLatFacebook]]></description>
			<content:encoded><![CDATA[&quot;Tungsten has made huge strides this year in performance and usability.&quot;
http://www.facebook.com/MySQLatFacebook<br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29708&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29708&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/08/16/facebook-praises-tungsten/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>451 CAOS Links 2011.07.08</title>
		<link>http://feedproxy.google.com/~r/451opensource/~3/O53wsEmLgTY/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=451-caos-links-2011-07-08</link>
		<comments>http://feedproxy.google.com/~r/451opensource/~3/O53wsEmLgTY/#comments</comments>
		<pubDate>Fri, 08 Jul 2011 15:37:56 +0000</pubDate>
		<dc:creator>The 451 Group</dc:creator>
				<category><![CDATA[$15]]></category>
		<category><![CDATA[android]]></category>
		<category><![CDATA[australian government]]></category>
		<category><![CDATA[Carlo daffara]]></category>
		<category><![CDATA[CERN]]></category>
		<category><![CDATA[Chrome]]></category>
		<category><![CDATA[clojure]]></category>
		<category><![CDATA[cloudbees]]></category>
		<category><![CDATA[eclipse]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[GitHub]]></category>
		<category><![CDATA[harmony]]></category>
		<category><![CDATA[heroku]]></category>
		<category><![CDATA[Links]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[Open Hardware]]></category>
		<category><![CDATA[Samsung]]></category>
		<category><![CDATA[savio rodrigues]]></category>
		<category><![CDATA[sourceforge]]></category>
		<category><![CDATA[stonebraker]]></category>
		<category><![CDATA[Wistron]]></category>

		<guid isPermaLink="false">http://blogs.the451group.com/opensource/?p=5362</guid>
		<description><![CDATA[Harmony disharmony. Microsoft&#8217;s Android revenue. And more.
# The Harmony Project released version 1.0 of its templates for standard contributor license agreements prompting comment and criticism from Dave Neary, Stephen Walli, Richard Fontana and Bradley M Kuhn.
# Microsoft reportedly demanded $15 for each Android smartphone handset made by Samsung, while the company announced a new patent agreement with Wistron that specifically mentioned both Android and Chrome. In case you missed it, it has previously been argued that Microsoft makes more money from Android than it does Windows Phone.
# CloudBees joined the Eclipse Foundation as a Solutions Member and the launched the CloudBees Toolkit for Eclipse plug-in.
# Carlo Daffara discussed open source as a differentiator (or not).
# &#8220;SourceForge is based around the idea of hosting open-source projects. GitHub is based around the idea of hosting open-source code.&#8221; Why SourceForge Lost
# CERN launched an Open Hardware initiative. 
# The Australian government published its Guide to Open Source Software.
# Savio Rodrigues discussed the apparent decline in open source contributions.
# Heroku added support for Clojure.
# Michael Stonebraker argued that Facebook&#8217;s MySQL deployment is a fate worse than death.]]></description>
			<content:encoded><![CDATA[<p>Harmony disharmony. Microsoft&#8217;s Android revenue. And more.</p>
<p># The Harmony Project <a href="http://harmonyagreements.org/agreements.html">released</a> version 1.0 of its templates for standard contributor license agreements prompting comment and criticism from <a href="http://blogs.gnome.org/bolsh/2011/07/06/harmony-agreements-reach-1-0/">Dave Neary</a>, <a href="http://www.networkworld.com/community/blog/peace-and-harmony">Stephen Walli</a>, <a href="http://opensource.com/law/11/7/trouble-harmony-part-1">Richard Fontana</a> and <a href="http://ebb.org/bkuhn/blog/2011/07/07/harmony-harmful.html">Bradley M Kuhn</a>.</p>
<p># Microsoft reportedly <a href="http://www.reuters.com/article/2011/07/06/us-samsung-microsoft-idUSTRE7651DB20110706">demanded</a> $15 for each Android smartphone handset made by Samsung, while the company <a href="http://www.microsoft.com/Presspass/press/2011/jul11/07-05WistronPR.mspx">announced</a> a new patent agreement with Wistron that specifically mentioned both Android and Chrome. In case you missed it, it has previously been <a href="http://allthingsd.com/20110527/microsofts-lucrative-new-revenue-stream-android/">argued</a> that Microsoft makes more money from Android than it does Windows Phone.</p>
<p># CloudBees <a href="http://www.marketwire.com/press-release/cloudbees-turbo-charges-java-developer-productivity-cloud-with-toolkit-eclipse-1535994.htm">joined</a> the Eclipse Foundation as a Solutions Member and the launched the CloudBees Toolkit for Eclipse plug-in.</p>
<p># Carlo Daffara <a href="http://carlodaffara.conecta.it/open-source-as-a-differentiator/">discussed</a> open source as a differentiator (or not).</p>
<p># &#8220;SourceForge is based around the idea of hosting open-source projects. GitHub is based around the idea of hosting open-source code.&#8221; <a href="http://usersinhell.com/why-sourceforge-lost/">Why SourceForge Lost</a></p>
<p># CERN <a href="http://press.web.cern.ch/press/PressReleases/Releases2011/PR08.11E.html">launched</a> an Open Hardware initiative. </p>
<p># The Australian government <a href="http://agimo.govspace.gov.au/2011/07/01/open-source-software-guide/">published</a> its Guide to Open Source Software.</p>
<p># Savio Rodrigues <a href="http://www.infoworld.com/d/open-source-software/no-need-worry-open-source-contributions-decline-785">discussed</a> the apparent decline in open source contributions.</p>
<p># Heroku <a href="http://blog.heroku.com/archives/2011/7/5/clojure_on_heroku/">added</a> support for Clojure.</p>
<p># Michael Stonebraker <a href="http://gigaom.com/cloud/facebook-trapped-in-mysql-fate-worse-than-death">argued</a> that Facebook&#8217;s MySQL deployment is a fate worse than death.</p>
<img src="http://feeds.feedburner.com/~r/451opensource/~4/O53wsEmLgTY" height="1" width="1" /><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29315&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29315&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/07/08/451-caos-links-2011-07-08/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>InnoDB locking makes me sad</title>
		<link>http://dom.as/2011/07/03/innodb-index-lock/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=innodb-locking-makes-me-sad</link>
		<comments>http://dom.as/2011/07/03/innodb-index-lock/#comments</comments>
		<pubDate>Sun, 03 Jul 2011 09:04:24 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[io]]></category>
		<category><![CDATA[mysql]]></category>
		<category><![CDATA[performance]]></category>

		<guid isPermaLink="false">http://dom.as/?p=873</guid>
		<description><![CDATA[Vadim and others have pointed at the index-&#62;lock problems before, but I think they didn&#8217;t good job enough at pointing out how bad it can get (the actual problematic was hidden somewhere as some odd edge case). What &#8216;index lock&#8217; means is generally the fact that InnoDB has table-level locking which will kill performance on big tables miserably.
InnoDB is a huge pie of layers, that have various locking behaviors, and are layered on top of each other, and are structured nicely as subdirectories in your innodb_plugin directory. Low level storage interfaces are done via os/ routines, then on top of that there&#8217;s some file space manager, fsp/, which allocates space for btr/ to live in, where individual page/ entities live, with multiple row/ pieces. There&#8217;re few other subsystems around, that got quite some attention lately &#8211; e.g. buf/ pool, transaction log/, and large trx/ transactions are composed of micro transactions living in mtr/. 
If you live in memory, you care about buffer pool and transaction log performance, if you write insane amounts of data to in-memory buffers you hit mtr/ problems and depend o how fast you can write out log/ or flush out buf/. If you are in I/O-heavy land most of stuff you care about happens in btr/.
Generally InnoDB is quite good about read scalability in I/O bound environments &#8211; nowadays one can saturate really fast I/O devices and there will be plenty of parallel reads done. Major scalability problem in this field was read-ahead which was funneling all read-ahead activity into a small set of threads, but other than that there can be hundreds of parallel reads issued to underlying devices. Situation changes when writes are added to the mix, though again, there&#8217;re few different scenarios.
There&#8217;re two ways for InnoDB to write out updates to pages, &#8220;optimistic&#8221; and &#8220;pessimistic&#8221;. Optimism here means that only in-page (page/row) operation will be needed without changing the tree structure. In one case you can expect quite high parallelism &#8211; multiple pages can be read for that operation at a time, multiple of them can be edited at a time, then some serialization will happen while writing out changes to redo log and undo segments. Expect good performance.
The much worse case is when B-Tree is supposed to be reorganized and multiple page operations can happen; thats pessimism. In this case whole index gets locked (via a read-write lock obtained from dict/),
then B-Tree path is latched, then changes are done, then it is all unlocked until next row operation needs to hit the tree. Unfortunately, both &#8216;path is latched&#8217; and &#8216;changes are done&#8217; are expensive operations, and not only in-core, but are doing sync page read-ins, one at a time, which on busy systems serving lots of read load are supposed to be slow. Ironically, as no other operations can happen on the table at that time, you may find out you have spare I/O capacity.. ;-)
What gets quite interesting though is the actual operation needed to latch b-tree path. Usual wisdom would say that if you want to change a row (read-modify-write), you probably looked up the page already, so there won&#8217;t be I/O. Unfortunately, InnoDB uses an slightly more complicated binary tree version, where pages have links to neighbors, and tree latching does this (a bit simplified for reading clarity):

/* x-latch also brothers from left to right */
get_block = btr_block_get(space, zip_size, left_page_no, RW_X_LATCH, mtr);
get_block = btr_block_get(space, zip_size, page_no, RW_X_LATCH, mtr);
get_block = btr_block_get(space, zip_size, right_page_no, RW_X_LATCH, mtr);

So, essentially in this case, just because InnoDB is being pessimistic, it reads neighboring blocks to lock them, even if they may not be touched/accessed in any way &#8211; and bloats buffer pool at that time with tripple reads. It doesn&#8217;t cost much if whole tree fits in memory, but it is doing three I/Os in here, if we&#8217;re pessimistic about InnoDB being pessimistic (and I am). So, this isn&#8217;t just locking problem &#8211; it is also resource consumption problem at this stage.
Now, as the dictionary lock is hold in write mode, not only updates to this table stop, but reads too &#8211; think MyISAM kind of stop. Of course, this &#8216;table locking&#8217; happens at entirely different layer than MyISAM. In MyISAM it is statement-length locking whereas in InnoDB this lock is held just for row operation on single index, but if statement is doing multiple row operations it can be acquired multiple times.
Probably there exist decent workarounds if anyone wants to tackle this &#8211; grabbing read locks on the tree while reading pages into buffer pool, then escalating lock to exclusive. A bit bigger architectural change would be allowing to grab locks on neighbors (if they are needed) without bringing in page data into memory &#8211; but that needs InnoDB overlords to look at it. Talk to your closest MySQL vendor and ask for a fix!
How do regular workloads hit this? Larger your records are, more likely you are to have tree changes, lower your performance will be. In my edge case I was inserting 7k sized rows &#8211; even though my machine had multiple disks, once the dataset fell out of buffer pool, it couldn&#8217;t insert more than 50 rows a second, even though there were many disks idle and capacity gods cried. It gets worse with out-of-page blobs &#8211; then every operation is pessimistic. 
Of course, there&#8217;re ways to work around this &#8211; usually by taking the hit of sharding/partitioning (this is where common wisdom of &#8220;large tables need to be partitioned&#8221; mostly comes from). Then, like with MyISAM, one will have multiple table locks and there may be some scalability then. 
TL;DR: InnoDB index lock is major architectural performance flaw, and that is why you hear that large tables are slower. There&#8217;s a big chance that there&#8217;re more scalable engines for on-disk writes out there, and all the large InnoDB write/insert benchmarks were severely hit by this.]]></description>
			<content:encoded><![CDATA[<p>Vadim and others have <a href="http://www.mysqlperformanceblog.com/2010/02/25/index-lock-and-adaptive-search-next-two-biggest-innodb-problems/">pointed</a> at the index->lock problems before, but I think they didn&#8217;t good job enough at pointing out how bad it can get (the actual problematic was hidden somewhere as some odd edge case). What &#8216;index lock&#8217; means is generally the fact that InnoDB has table-level locking which will kill performance on big tables miserably.</p>
<p>InnoDB is a huge pie of layers, that have various locking behaviors, and are layered on top of each other, and are structured nicely as subdirectories in your innodb_plugin directory. Low level storage interfaces are done via os/ routines, then on top of that there&#8217;s some file space manager, fsp/, which allocates space for btr/ to live in, where individual page/ entities live, with multiple row/ pieces. There&#8217;re few other subsystems around, that got quite some attention lately &#8211; e.g. buf/ pool, transaction log/, and large trx/ transactions are composed of micro transactions living in mtr/. </p>
<p>If you live in memory, you care about buffer pool and transaction log performance, if you write insane amounts of data to in-memory buffers you hit mtr/ problems and depend o how fast you can write out log/ or flush out buf/. If you are in I/O-heavy land most of stuff you care about happens in btr/.</p>
<p>Generally InnoDB is quite good about read scalability in I/O bound environments &#8211; nowadays one can saturate really fast I/O devices and there will be plenty of parallel reads done. Major scalability problem in this field was read-ahead which was funneling all read-ahead activity into a small set of threads, but other than that there can be hundreds of parallel reads issued to underlying devices. Situation changes when writes are added to the mix, though again, there&#8217;re few different scenarios.</p>
<p>There&#8217;re two ways for InnoDB to write out updates to pages, &#8220;optimistic&#8221; and &#8220;pessimistic&#8221;. Optimism here means that only in-page (page/row) operation will be needed without changing the tree structure. In one case you can expect quite high parallelism &#8211; multiple pages can be read for that operation at a time, multiple of them can be edited at a time, then some serialization will happen while writing out changes to redo log and undo segments. Expect good performance.</p>
<p>The much worse case is when B-Tree is supposed to be reorganized and multiple page operations can happen; thats pessimism. In this case whole index gets locked (via a read-write lock obtained from dict/),<br />
then B-Tree path is latched, then changes are done, then it is all unlocked until next row operation needs to hit the tree. Unfortunately, both &#8216;path is latched&#8217; and &#8216;changes are done&#8217; are expensive operations, and not only in-core, but are doing sync page read-ins, one at a time, which on busy systems serving lots of read load are supposed to be slow. Ironically, as no other operations can happen on the table at that time, you may find out you have spare I/O capacity.. ;-)</p>
<p>What gets quite interesting though is the actual operation needed to latch b-tree path. Usual wisdom would say that if you want to change a row (read-modify-write), you probably looked up the page already, so there won&#8217;t be I/O. Unfortunately, InnoDB uses an slightly more complicated binary tree version, where pages have links to neighbors, and tree latching does this (a bit simplified for reading clarity):</p>
<p><code><br />
/* x-latch also brothers from left to right */<br />
get_block = btr_block_get(space, zip_size, left_page_no, RW_X_LATCH, mtr);<br />
get_block = btr_block_get(space, zip_size, page_no, RW_X_LATCH, mtr);<br />
get_block = btr_block_get(space, zip_size, right_page_no, RW_X_LATCH, mtr);<br />
</code></p>
<p>So, essentially in this case, just because InnoDB is being pessimistic, it reads neighboring blocks to lock them, even if they may not be touched/accessed in any way &#8211; and bloats buffer pool at that time with tripple reads. It doesn&#8217;t cost much if whole tree fits in memory, but it is doing three I/Os in here, if we&#8217;re pessimistic about InnoDB being pessimistic (and I am). So, this isn&#8217;t just locking problem &#8211; it is also resource consumption problem at this stage.</p>
<p>Now, as the dictionary lock is hold in write mode, not only updates to this table stop, but reads too &#8211; think MyISAM kind of stop. Of course, this &#8216;table locking&#8217; happens at entirely different layer than MyISAM. In MyISAM it is statement-length locking whereas in InnoDB this lock is held just for row operation on single index, but if statement is doing multiple row operations it can be acquired multiple times.</p>
<p>Probably there exist decent workarounds if anyone wants to tackle this &#8211; grabbing read locks on the tree while reading pages into buffer pool, then escalating lock to exclusive. A bit bigger architectural change would be allowing to grab locks on neighbors (if they are needed) without bringing in page data into memory &#8211; but that needs InnoDB overlords to look at it. Talk to your closest MySQL vendor and ask for a fix!</p>
<p>How do regular workloads hit this? Larger your records are, more likely you are to have tree changes, lower your performance will be. In my edge case I was inserting 7k sized rows &#8211; even though my machine had multiple disks, once the dataset fell out of buffer pool, it couldn&#8217;t insert more than 50 rows a second, even though there were many disks idle and capacity gods cried. It gets worse with out-of-page blobs &#8211; then every operation is pessimistic. </p>
<p>Of course, there&#8217;re ways to work around this &#8211; usually by taking the hit of sharding/partitioning (this is where common wisdom of &#8220;large tables need to be partitioned&#8221; mostly comes from). Then, like with MyISAM, one will have multiple table locks and there may be some scalability then. </p>
<p>TL;DR: InnoDB index lock is major architectural performance flaw, and that is why you hear that large tables are slower. There&#8217;s a big chance that there&#8217;re more scalable engines for on-disk writes out there, and all the large InnoDB write/insert benchmarks were severely hit by this. </p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29278&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=29278&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/07/03/innodb-locking-makes-me-sad/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>A case for FORCE INDEX</title>
		<link>http://dom.as/2011/01/27/a-case-for-force-index/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=a-case-for-force-index</link>
		<comments>http://dom.as/2011/01/27/a-case-for-force-index/#comments</comments>
		<pubDate>Thu, 27 Jan 2011 04:57:58 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[facebook]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://dom.as/?p=834</guid>
		<description><![CDATA[I remember various discussions in different mediums where people were building cases against use of FORCE INDEX in SQL queries. I&#8217;ll hereby suggest it using way more often, but at first I&#8217;ll start with small explanation.
For ages, the concept of index statistics affecting query plans has been clogging minds of DBAs, supported by long explanations of MyISAM and InnoDB manuals. Actually, statistics are used just for determining which index to use for a joined table, as predicate is not known at the time of &#8216;optimization&#8217;. 
What happens if you do a simple query like: 
SELECT * FROM table WHERE a=5 AND b=6

? If there&#8217;s an index that enforces uniqueness on (a,b), it will be used &#8211; this is short-path for PRIMARY KEY lookups. Otherwise, it will go to any index, composite or not, that can satisfy either a or b (or both), and evaluate how many rows it will fetch from it using the provided criteria. 
Now, contrary to what people usually think, the row count evaluation has nothing really much to do with cardinality statistics &#8211; instead it builds the range that the known predicate can check on existing index, and does two full B-Tree dives to the index &#8211; one at the start of the range, and one at the end of it. For each possible index.
This simply means that even if you are not using the index to execute query, two leaf pages (and all the tree branches to reach them) will end up being fetched from disk into the cache &#8211; wasting both I/O cycles and memory. 
There&#8217;s also quite interesting paradox at this &#8211; in some cases, more similar other indexes are, more waste they create because of rows-in-range checks. If a table has indexes on (a,b,c) and (a,b,d), query for (a,b,d) will be best satisfied by (a,b,d) index, but will evaluate range sizes for (a,b). If the first index were (a,c,b), it would be only able to check head and tail of (a) &#8211; so way less B-Tree positions would be cached in memory for the check. This makes better indexing sometimes fare worse than what they&#8217;re worth in benchmarks (assuming that people do I/O-heavy benchmarking :)
The easy way out is using FORCE INDEX. It will not do the index evaluation &#8211; and no B-Tree dives on unneeded index. 
In my edge case testing with real data and skewed access pattern hitting a second index during &#8216;statistics&#8217; phase has increased execution time by 70%, number of I/Os done by 75%, number of entrances into buffer pool by 31% and bloated buffer pool with data I didn&#8217;t need for read workload. 
For some queries like &#8220;newest 10 entries&#8221; this will actually waste some space preheating blocks from the other end of the range that will never be shown &#8211; there will definitely be a B-Tree leaf page in buffer pool with edits from few years ago because of RIR. Unfortunately, the only MySQL-side solution for this is HANDLER interface (or probably HandlerSocket) &#8211; but it doesn&#8217;t make using FORCE INDEX not worth it &#8211; it just pushes towards making FORCE INDEX be much more forceful. 
So, use the FORCE, Luke :)]]></description>
			<content:encoded><![CDATA[<p>I remember various discussions in different mediums where people were building cases against use of FORCE INDEX in SQL queries. I&#8217;ll hereby suggest it using way more often, but at first I&#8217;ll start with small explanation.</p>
<p>For ages, the concept of index statistics affecting query plans has been clogging minds of DBAs, supported by long explanations of <a href="http://dev.mysql.com/doc/refman/5.1/en/myisam-index-statistics.html">MyISAM</a> and <a href="http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-other-changes-statistics-estimation.html">InnoDB</a> manuals. Actually, statistics are used just for determining which index to use for a joined table, as predicate is not known at the time of &#8216;optimization&#8217;. </p>
<p>What happens if you do a simple query like: </p>
<blockquote><p>SELECT * FROM table WHERE a=5 AND b=6
</p></blockquote>
<p>? If there&#8217;s an index that enforces uniqueness on (a,b), it will be used &#8211; this is short-path for PRIMARY KEY lookups. Otherwise, it will go to any index, composite or not, that can satisfy either a or b (or both), and evaluate how many rows it will fetch from it using the provided criteria. </p>
<p>Now, contrary to what people usually think, the row count evaluation has nothing really much to do with cardinality statistics &#8211; instead it builds the range that the known predicate can check on existing index, and does two full B-Tree dives to the index &#8211; one at the start of the range, and one at the end of it. For each possible index.<br />
This simply means that even if you are not using the index to execute query, two leaf pages (and all the tree branches to reach them) will end up being fetched from disk into the cache &#8211; wasting both I/O cycles and memory. </p>
<p>There&#8217;s also quite interesting paradox at this &#8211; in some cases, more similar other indexes are, more waste they create because of rows-in-range checks. If a table has indexes on (a,b,c) and (a,b,d), query for (a,b,d) will be best satisfied by (a,b,d) index, but will evaluate range sizes for (a,b). If the first index were (a,c,b), it would be only able to check head and tail of (a) &#8211; so way less B-Tree positions would be cached in memory for the check. This makes better indexing sometimes fare worse than what they&#8217;re worth in benchmarks (assuming that people do I/O-heavy benchmarking :)</p>
<p>The easy way out is using FORCE INDEX. It will not do the index evaluation &#8211; and no B-Tree dives on unneeded index. </p>
<p>In my edge case testing with real data and skewed access pattern hitting a second index during &#8216;statistics&#8217; phase has increased execution time by 70%, number of I/Os done by 75%, number of entrances into buffer pool by 31% and bloated buffer pool with data I didn&#8217;t need for read workload. </p>
<p>For some queries like &#8220;newest 10 entries&#8221; this will actually waste some space preheating blocks from the other end of the range that will never be shown &#8211; there will definitely be a B-Tree leaf page in buffer pool with edits from few years ago because of RIR. Unfortunately, the only MySQL-side solution for this is HANDLER interface (or probably HandlerSocket) &#8211; but it doesn&#8217;t make using FORCE INDEX not worth it &#8211; it just pushes towards making FORCE INDEX be much more forceful. </p>
<p>So, use the FORCE, Luke :) </p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=27112&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=27112&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2011/01/27/a-case-for-force-index/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Logs memory pressure</title>
		<link>http://dom.as/2010/11/18/logs-memory-pressure/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=logs-memory-pressure</link>
		<comments>http://dom.as/2010/11/18/logs-memory-pressure/#comments</comments>
		<pubDate>Thu, 18 Nov 2010 12:59:33 +0000</pubDate>
		<dc:creator>Domas Mituzas</dc:creator>
				<category><![CDATA[directio]]></category>
		<category><![CDATA[facebook]]></category>
		<category><![CDATA[InnoDB]]></category>
		<category><![CDATA[io]]></category>
		<category><![CDATA[memory]]></category>
		<category><![CDATA[mysql]]></category>

		<guid isPermaLink="false">http://dom.as/?p=818</guid>
		<description><![CDATA[Warning, this may be kernel version specific, albeit this kernel is used by many database systems
Lately I&#8217;ve been working on getting more memory used by InnoDB buffer pool &#8211; besides obvious things like InnoDB memory tax there were seemingly external factors that were pushing out MySQL into swap (even with swappiness=0). We were working a lot on getting low hanging fruits like scripts that use too much memory, but they seem to be all somewhat gone, but MySQL has way too much memory pressure from outside. 
I grabbed my uncache utility to assist with the investigation and started uncaching various bits on two systems, one that had larger buffer pool (60G), which was already being sent to swap, and a conservatively allocated (55G) machine, both 72G boxes. Initial finds were somewhat surprising &#8211; apparently on both machines most of external-to-mysqld memory was conserved by two sets of items:

binary logs &#8211; write once, read only tail (sometimes, if MySQL I/O cache cannot satisfy) &#8211; we saw nearly 10G consumed by binlogs on conservatively allocated machines
transaction logs &#8211; write many, read never (by MySQL), buffered I/O &#8211; full set of transaction logs was found in memory

It was remarkably easy to get rid of binlogs from cache, both by calling out &#8216;uncache&#8217; from scripts, or using this tiny Python class:

libc = ctypes.CDLL("libc.so.6")
class cachedfile (file):
    FADV_DONTNEED = 4
    def uncache(self):
        libc.posix_fadvise(self.fileno(), 0, 0, self.FADV_DONTNEED)

As it was major memory stress source, it was somewhat a no brainer that binlogs have to be removed from cache &#8211; something that can be serially re-read is taking space away from a buffer pool which avoids random reads. It may make sense to call posix_fadvise() right after writes to them, even. 
Transaction logs, on the other hand, are entirely different beast. From MySQL perspective they should be uncached immediately, as nobody ever ever reads them (crash recovery aside, but re-reading then is relatively cheap, as no writes or random reads are done during log read phase). Unfortunately, the problem lies way below MySQL, and thanks to PeterZ for reminding me (we had a small chat about this at Jeremy&#8217;s Silicon Valley MySQL Meetup).
MySQL transaction records are stored in multiple log groups per transaction, then written out as per-log-group writes (each is in multiple of 512 bytes), followed by fsync(). This allows FS to do transaction log write as single I/O operation. This also means that it will be doing partial page writes to buffered files &#8211; overwriting existing data in part of the page, so it has to be read from storage. 
So, if all transaction log pages are removed from cache, quite some of them will have to be read back in (depending on sizes of transactions, probably all of them in some cases). Oddly enough, when I tried to hit the edge case, single thread transactions-per-second remained same, but I saw consistent read I/O traffic on disks. So, this would probably work on systems, that have spare I/O (e.g. flash based ones). 
Of course, as writes are already in multiples of 512 (and appears that memory got allocated just fine), I could try out direct I/O &#8211; it should avoid page read-in problem and not cause any memory pressure by itself. In this case switching InnoDB to use O_DIRECT was a bit dirtier &#8211; one needs to edit source code and rebuild the server, restart, etc, or&#8230;

# lsof ib_logfile*
# gdb -p $(pidof mysqld)
(gdb) call os_file_set_nocache(9, "test", "test")
(gdb) call os_file_set_nocache(10, "test", "test")

I did not remove fsync() call, but as it is somewhat noop on O_DIRECT files, I left it there, probably it would change benchmark results, but not much.
Some observations:

O_DIRECT was ~10% faster at best case scenario &#8211; lots of tiny transactions in single thread
If group commit is used (without binlogs), InnoDB can have way more transactions with multiple threads using buffered I/O, as it does multiple writes per fsync
Enabling sync_binlog makes the difference not that big &#8211; even with many parallel writes direct writes are 10-20% slower than buffered ones
Same for innodb_flush_log_on_trx_commit0 &#8211; multiple writes per fsync are much more efficient with buffered I/O
One would need to do log group merge to have more efficient O_DIRECT for larger transactions
O_DIRECT does not have theoretical disadvantage, current deficiencies are just implementation oriented at buffered I/O &#8211; and can be resolved by (in same areas &#8211; extensive) engineering
YMMV. In certain cases it definitely makes sense even right now, in some other &#8211; not so much

So, the outcome here depends on many variables &#8211; with flash read-on-write is not as expensive, especially if read-ahead works. With disks one has to see what is better use for the memory &#8211; using it for buffer pool reduces amount of data reads, but causes log reads. And of course, O_DIRECT wins in the long run :-)
With this data moved away from cache and InnoDB memory tax reduced one could switch from using 75 % of memory to 90% or even 95% for InnoDB buffer pools. Yay?]]></description>
			<content:encoded><![CDATA[<p><i>Warning, this may be kernel version specific, albeit this kernel is used by many database systems</i></p>
<p>Lately I&#8217;ve been working on getting more memory used by InnoDB buffer pool &#8211; besides obvious things like InnoDB <a href="http://dom.as/2008/05/29/wasting-innodb-memory/">memory tax</a> there were seemingly external factors that were pushing out MySQL into swap (even with swappiness=0). We were working a lot on getting low hanging fruits like scripts that use too much memory, but they seem to be all somewhat gone, but MySQL has way too much memory pressure from outside. </p>
<p>I grabbed my <a href="http://dom.as/2009/06/26/uncache/">uncache</a> utility to assist with the investigation and started uncaching various bits on two systems, one that had larger buffer pool (60G), which was already being sent to swap, and a conservatively allocated (55G) machine, both 72G boxes. Initial finds were somewhat surprising &#8211; apparently on both machines most of external-to-mysqld memory was conserved by two sets of items:</p>
<ul>
<li><b>binary logs</b> &#8211; write once, read only tail (sometimes, if MySQL I/O cache cannot satisfy) &#8211; we saw nearly 10G consumed by binlogs on conservatively allocated machines</li>
<li><b>transaction logs</b> &#8211; write many, read never (by MySQL), buffered I/O &#8211; full set of transaction logs was found in memory</li>
</ul>
<p>It was remarkably easy to get rid of binlogs from cache, both by calling out &#8216;uncache&#8217; from scripts, or using this tiny Python class:</p>
<pre>
libc = ctypes.CDLL("libc.so.6")
class cachedfile (file):
    FADV_DONTNEED = 4
    def uncache(self):
        libc.posix_fadvise(self.fileno(), 0, 0, self.FADV_DONTNEED)
</pre>
<p>As it was major memory stress source, it was somewhat a no brainer that binlogs have to be removed from cache &#8211; something that can be serially re-read is taking space away from a buffer pool which avoids random reads. It may make sense to call posix_fadvise() right after writes to them, even. </p>
<p>Transaction logs, on the other hand, are entirely different beast. From MySQL perspective they should be uncached immediately, as nobody ever ever reads them (crash recovery aside, but re-reading then is relatively cheap, as no writes or random reads are done during log read phase). Unfortunately, the problem lies way below MySQL, and thanks to PeterZ for reminding me (we had a small chat about this at Jeremy&#8217;s <a href="http://www.meetup.com/mysql-silicon-valley/">Silicon Valley MySQL Meetup</a>).</p>
<p>MySQL transaction records are stored in multiple log groups per transaction, then written out as per-log-group writes (each is in multiple of 512 bytes), followed by fsync(). This allows FS to do transaction log write as single I/O operation. This also means that it will be doing partial page writes to buffered files &#8211; overwriting existing data in part of the page, so it has to be read from storage. </p>
<p>So, if all transaction log pages are removed from cache, quite some of them will have to be read back in (depending on sizes of transactions, probably all of them in some cases). Oddly enough, when I tried to hit the edge case, single thread transactions-per-second remained same, but I saw consistent read I/O traffic on disks. So, this would probably work on systems, that have spare I/O (e.g. flash based ones). </p>
<p>Of course, as writes are already in multiples of 512 (and appears that memory got allocated just fine), I could try out direct I/O &#8211; it should avoid page read-in problem and not cause any memory pressure by itself. In this case switching InnoDB to use O_DIRECT was a bit dirtier &#8211; one needs to edit source code and rebuild the server, restart, etc, or&#8230;<br />
<code><br />
# lsof ib_logfile*<br />
# gdb -p $(pidof mysqld)<br />
(gdb) call os_file_set_nocache(9, "test", "test")<br />
(gdb) call os_file_set_nocache(10, "test", "test")<br />
</code><br />
I did not remove fsync() call, but as it is somewhat noop on O_DIRECT files, I left it there, probably it would change benchmark results, but not much.</p>
<p>Some observations:</p>
<ul>
<li>O_DIRECT was ~10% faster at best case scenario &#8211; lots of tiny transactions in single thread</li>
<li>If group commit is used (without binlogs), InnoDB can have way more transactions with multiple threads using buffered I/O, as it does multiple writes per fsync</li>
<li>Enabling sync_binlog makes the difference not that big &#8211; even with many parallel writes direct writes are 10-20% slower than buffered ones</li>
<li>Same for innodb_flush_log_on_trx_commit<>0 &#8211; multiple writes per fsync are much more efficient with buffered I/O</li>
<li>One would need to do log group merge to have more efficient O_DIRECT for larger transactions</li>
<li>O_DIRECT does not have theoretical disadvantage, current deficiencies are just implementation oriented at buffered I/O &#8211; and can be resolved by (in same areas &#8211; extensive) engineering</li>
<li>YMMV. In certain cases it definitely makes sense even right now, in some other &#8211; not so much</li>
</ul>
<p>So, the outcome here depends on many variables &#8211; with flash read-on-write is not as expensive, especially if read-ahead works. With disks one has to see what is better use for the memory &#8211; using it for buffer pool reduces amount of data reads, but causes log reads. And of course, O_DIRECT wins in the long run :-)</p>
<p>With this data moved away from cache and InnoDB memory tax reduced one could switch from using 75 % of memory to 90% or even 95% for InnoDB buffer pools. Yay?</p><br/>PlanetMySQL Voting:
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=26500&vote=1&apivote=1">Vote UP</a> /
	 <a href="http://planet.mysql.com/entry/vote/?entry_id=26500&vote=-1&apivote=1">Vote DOWN</a>]]></content:encoded>
			<wfw:commentRss>http://planetmysql.ru/2010/11/18/logs-memory-pressure/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

