Archive for the ‘Web 2.0’ Category

Could closed core prove a more robust model than open core?

Декабрь 2nd, 2011

When participating recently in a sprint held at Google to document four free software projects, I thought about what might have prompted Google to invest in this effort. Their willingness to provide a hotel, work space, and food for some thirty participants, along with staff support all week long, demonstrates their commitment to nurturing open source.

Google is one of several companies for which I'll coin the term "closed core." The code on which they build their business and make their money is secret. (And given the enormous infrastructure it takes to provide a search service, opening the source code wouldn't do much to stimulate competition, as I point out in a posting on O'Reilly's radar blog). But they depend on a huge range of free software, ranging from Linux running on their racks to numerous programming languages and libraries that they've drawn on to develop their services.

So Google contributes a lot back to the free software community. The release code for many non-essential functions. They promote the adoption of standards such as HTML 5. They have been among the first companies to offer APIs for important functions, including their popular Google Maps. They have opened the source code to Android (although its development remains under their control), which has been the determining factor in making Android devices compete with the arguably more highly-functioning iOS products. They even created a whole new programming language (Go) and are working on another.

Google is not the only "closed core" company (for instance, Facebook has also built their service around APIs and released their Cassandra project). Microsoft has a whole open source program, including some important contributions to health IT. Scads of other companies, such as IBM, Hewlett Packard, and VMware, have complex relationships to open source software that don't fit a simple "open core" or "closed core" model. But the closed core trend represents a fertile collaboration between communities and companies that have businesses in specific areas. The closed core model requires businesses to determine where their unique value lies and to be generous in offering the public extra code that supports their infrastructure but does not drive revenue.

This model may prove more robust and lasting than open core, which attracts companies occupying minor positions in their industries. The shining example of open core is MySQL, but its complex status, including a long history of dual licensing and simultaneous development by several organizations, make it a difficult model from which to draw lessons about the whole movement. In particular, Software as a Service redefines the relationships that the free software movement has traditionally defined between open and proprietary. Deploying and monitoring the core SaaS software creates large areas for potential innovation, as we saw with Cassandra, where a company can benefit from turning their code into a community project.


PlanetMySQL Voting: Vote UP / Vote DOWN

What is the biggest challenge for Big Data?

Сентябрь 9th, 2011

Often I think about challenges that organizations face with “Big Data”.  While Big Data is a generic and over used term, what I am really referring to is an organizations ability to disseminate, understand and ultimately benefit from increasing volumes of data.  It is almost without question that in the future customers will be won/lost, competitive advantage will be gained/forfeited and businesses will succeed/fail based on their ability to leverage their data assets.

It may be surprising what I think are the near term challenges.  Largely I don’t think these are purely technical.  There are enough wheels in motion now to almost guarantee that data accessibility will continue to improve at pace in-line with the increase in data volume.  Sure, there will continue to be lots of interesting innovation with technology, but when organizations like Google are doing 10PB sorts on 8000 machines in just over 6 hours – we know the technical scope for Big Data exists and eventually will flow down to the masses, and such scale will likely be achievable by most organizations in the next decade.

Instead I think the core problem that needs to be addressed relates to people and skills.  There are lots of technical engineers who can build distributed systems, orders of magnitude more who can operate them and fill them to the brim with captured data.  But where I think we are lacking skills is with people who know what to do with the data.  People who know how to make it actually useful.  Sure, a BI industry exists today but I think this is currently more focused on the engineering challenges of providing an organization with faster/easier access to their existing knowledge rather than reaching out into the distance and discovering new knowledge.  The people with pure data analysis and knowledge discovery skills are much harder to find, and these are the people who are going to be front and center driving the big data revolution.  People who you can give a few PB of data too and they can provide you back information, discoveries, trends, factoids, patterns, beautiful visualizations and needles you didn’t even know were in the haystack.

These are people who can make a real and significant impact on an organizations bottom line, or help solve some of the world’s problems when applied to R&D.  Data Geeks are the people to be revered in the future and hopefully we see a steady increase in people wanting to grow up to be Data Scientists. 


PlanetMySQL Voting: Vote UP / Vote DOWN

Building data startups: Fast, big, and focused

Август 9th, 2011

This is a written follow-up to a talk presented at a recent Strata online event.

A new breed of startup is emerging, built to take advantage of the rising tides of data across a variety of verticals and the maturing ecosystem of tools for its large-scale analysis.

These are data startups, and they are the sumo wrestlers on the startup stage. The weight of data is a source of their competitive advantage. But like their sumo mentors, size alone is not enough. The most successful of data startups must be fast (with data), big (with analytics), and focused (with services).

Setting the stage: The attack of the exponentials

The question of why this style of startup is arising today, versus a decade ago, owes to a confluence of forces that I call the Attack of the Exponentials. In short, over the past five decades, the cost of storage, CPU, and bandwidth has been exponentially dropping, while network access has exponentially increased. In 1980, a terabyte of disk storage cost $14 million dollars. Today, it's at $30 and dropping. Classes of data that were previously economically unviable to store and mine, such as machine-generated log files, now represent prospects for profit.

Attack of the exponentials

At the same time, these technological forces are not symmetric: CPU and storage costs have fallen faster than that of network and disk IO. Thus data is heavy; it gravitates toward centers of storage and compute power in proportion to its mass. Migration to the cloud is the manifest destiny for big data, and the cloud is the launching pad for data startups.

Leveraging the big data stack


As the foundational layer in the big data stack, the cloud provides
the scalable persistence and compute power needed to manufacture data
products.

At the middle layer of the big data stack is analytics, where features are extracted from data, and fed into classification and prediction algorithms.

Finally, at the top of the stack are services and applications. This is the level at which consumers experience a data product, whether it be a music recommendation or a traffic route prediction.

Let's take each of layers and discuss the competitive axes at each.

The emerging big data stack
The competitive axes and representative technologies on the Big Data stack are illustrated here. At the bottom tier of data, free tools are shown in red (MySQL, Postgres, Hadoop), and we see how their commercial adaptations (InfoBright, Greenplum, MapR) compete principally along the axis of speed; offering faster processing and query times. Several of these players are pushing up towards the second tier of the data stack, analytics. At this layer, the primary competitive axis is scale: few offerings can address terabyte-scale data sets, and those that do are typically proprietary. Finally, at the top layer of the big data stack lies the services that touch consumers and businesses. Here, focus within a specific sector, combined with depth that reaches downward into the analytics tier, is the defining competitive advantage.

Fast data

At the base of the big data stack — where data is stored, processed, and queried — the dominant axis of competition was once scale. But as cheaper commodity disks and Hadoop have effectively addressed scalable persistence and processing, the focus of competition has shifted toward speed. The demand for faster disks has led to an explosion in interest in solid-state disk firms, such as Fusion-IO, which went public recently. And several startups, most notably MapR, are promising faster versions of Hadoop.

FusionIO and MapR represent another trend at the data layer: commercial technologies that challenge open source or commodity offerings on an efficiency basis, namely watts or CPU cycles consumed. With energy costs driving between one-third and one-half of data center operating costs, these efficiencies have a direct financial impact.

Finally, just as many large-scale, NoSQL data stores are moving from disk to SSD, others have observed that many traditional, relational databases will soon be entirely in memory. This is particularly true for applications that require repeated, fast access to a full set of data, such as building models from customer-product matrices. This brings us to the second tier of the big data stack, analytics.

Big analytics

At the second tier of the big data stack, analytics is the brains to cloud computing's brawn. Here, however, the speed is less of a challenge; given an addressable data set in memory, most statistical algorithms can yield results in seconds. The challenge is scaling these out to address large datasets, and rewriting algorithms to operate in an online, distributed manner across many machines.

Because data is heavy, and algorithms are light, one key strategy is to push code deeper to where the data lives, to minimize network IO. This often requires a tight coupling between the data storage layer and the analytics, and algorithms often need to be re-written as user-defined functions (UDFs) in a language compatible with the data layer. Greenplum, leveraging its Postgres roots, supports UDFs written in both Java and R. Following Google's BigTable, HBase is introducing coprocessors in its 0.92 release, which allows Java code to be associated with data tablets, and minimize data transfer over the network. Netezza pushes even further into hardware, embedding an array of functions into FPGAs that are physically co-located with the disks of its storage appliances.

The field of what's alternatively called business or predictive analytics is nascent, and while a range of enabling tools and platforms exist (such as R, SPSS, and SAS), most of the algorithms developed are proprietary and vertical-specific. As the ecosystem matures, one may expect to see the rise of firms selling analytical services — such as recommendation engines — that interoperate across data platforms. But in the near-term, consultancies like Accenture and McKinsey, are positioning themselves to provide big analytics via billable hours.

Outside of consulting, firms with analytical strengths push upward, surfacing focused products or services to achieve success.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 30% on registration with the code STN11RAD

Focused services

The top of the big data stack is where data products and services directly touch consumers and businesses. For data startups, these offerings more frequently take the form of a service, offered as an API rather than a bundle of bits.

BillGuard is a great example of a startup offering a focused data service. It monitors customers' credit card statements for dubious charges, and even leverages the collective behavior of users to improve its fraud predictions.

Several startups are working on algorithms that can crack the content relevance nut, including Flipboard and News.me. Klout delivers a pure data service that uses social media activity to measure online influence. My company, Metamarkets, crunches server logs to provide pricing analytics for publishers.

For data startups, data processes and algorithms define their competitive advantage. Poor predictions — whether of fraud, relevance, influence, or price — will sink a data startup, no matter how well-designed their web UI or mobile application.

Focused data services aren't limited to startups: LinkedIn's People You May Know and FourSquare's Explore feature enhance engagement of their companies' core products, but only when they correctly suggest people and places.

Democratizing big data

The axes of strategy in the big data stack show analytics to be squarely at the center. Data platform providers are pushing upwards into analytics to differentiate themselves, touting support for fast, distributed code execution close to the data. Traditional analytics players, such as SAS and SAP, are expanding their storage footprints and challenging the need for alternative data platforms as staging areas. Finally, data startups and many established firms are creating services whose success hinges directly on proprietary analytics algorithms.

The emergence of data startups highlights the democratizing consequences of a maturing big data stack. For the first time, companies can successfully build offerings without deep infrastructure know-how and focus at a higher level, developing analytics and services. By all indications, this is a democratic force that promises to unleash a wave of innovation in the coming decade.



Related:




PlanetMySQL Voting: Vote UP / Vote DOWN

Building data startups: Fast, big, and focused

Август 9th, 2011

This is a written follow-up to a talk presented at a recent Strata online event.

A new breed of startup is emerging, built to take advantage of the rising tides of data across a variety of verticals and the maturing ecosystem of tools for its large-scale analysis.

These are data startups, and they are the sumo wrestlers on the startup stage. The weight of data is a source of their competitive advantage. But like their sumo mentors, size alone is not enough. The most successful of data startups must be fast (with data), big (with analytics), and focused (with services).

Setting the stage: The attack of the exponentials

The question of why this style of startup is arising today, versus a decade ago, owes to a confluence of forces that I call the Attack of the Exponentials. In short, over the past five decades, the cost of storage, CPU, and bandwidth has been exponentially dropping, while network access has exponentially increased. In 1980, a terabyte of disk storage cost $14 million dollars. Today, it's at $30 and dropping. Classes of data that were previously economically unviable to store and mine, such as machine-generated log files, now represent prospects for profit.

Attack of the exponentials

At the same time, these technological forces are not symmetric: CPU and storage costs have fallen faster than that of network and disk IO. Thus data is heavy; it gravitates toward centers of storage and compute power in proportion to its mass. Migration to the cloud is the manifest destiny for big data, and the cloud is the launching pad for data startups.

Leveraging the big data stack


As the foundational layer in the big data stack, the cloud provides
the scalable persistence and compute power needed to manufacture data
products.

At the middle layer of the big data stack is analytics, where features are extracted from data, and fed into classification and prediction algorithms.

Finally, at the top of the stack are services and applications. This is the level at which consumers experience a data product, whether it be a music recommendation or a traffic route prediction.

Let's take each of layers and discuss the competitive axes at each.

The emerging big data stack
The competitive axes and representative technologies on the Big Data stack are illustrated here. At the bottom tier of data, free tools are shown in red (MySQL, Postgres, Hadoop), and we see how their commercial adaptations (InfoBright, Greenplum, MapR) compete principally along the axis of speed; offering faster processing and query times. Several of these players are pushing up towards the second tier of the data stack, analytics. At this layer, the primary competitive axis is scale: few offerings can address terabyte-scale data sets, and those that do are typically proprietary. Finally, at the top layer of the big data stack lies the services that touch consumers and businesses. Here, focus within a specific sector, combined with depth that reaches downward into the analytics tier, is the defining competitive advantage.

Fast data

At the base of the big data stack — where data is stored, processed, and queried — the dominant axis of competition was once scale. But as cheaper commodity disks and Hadoop have effectively addressed scalable persistence and processing, the focus of competition has shifted toward speed. The demand for faster disks has led to an explosion in interest in solid-state disk firms, such as Fusion-IO, which went public recently. And several startups, most notably MapR, are promising faster versions of Hadoop.

FusionIO and MapR represent another trend at the data layer: commercial technologies that challenge open source or commodity offerings on an efficiency basis, namely watts or CPU cycles consumed. With energy costs driving between one-third and one-half of data center operating costs, these efficiencies have a direct financial impact.

Finally, just as many large-scale, NoSQL data stores are moving from disk to SSD, others have observed that many traditional, relational databases will soon be entirely in memory. This is particularly true for applications that require repeated, fast access to a full set of data, such as building models from customer-product matrices. This brings us to the second tier of the big data stack, analytics.

Big analytics

At the second tier of the big data stack, analytics is the brains to cloud computing's brawn. Here, however, the speed is less of a challenge; given an addressable data set in memory, most statistical algorithms can yield results in seconds. The challenge is scaling these out to address large datasets, and rewriting algorithms to operate in an online, distributed manner across many machines.

Because data is heavy, and algorithms are light, one key strategy is to push code deeper to where the data lives, to minimize network IO. This often requires a tight coupling between the data storage layer and the analytics, and algorithms often need to be re-written as user-defined functions (UDFs) in a language compatible with the data layer. Greenplum, leveraging its Postgres roots, supports UDFs written in both Java and R. Following Google's BigTable, HBase is introducing coprocessors in its 0.92 release, which allows Java code to be associated with data tablets, and minimize data transfer over the network. Netezza pushes even further into hardware, embedding an array of functions into FPGAs that are physically co-located with the disks of its storage appliances.

The field of what's alternatively called business or predictive analytics is nascent, and while a range of enabling tools and platforms exist (such as R, SPSS, and SAS), most of the algorithms developed are proprietary and vertical-specific. As the ecosystem matures, one may expect to see the rise of firms selling analytical services — such as recommendation engines — that interoperate across data platforms. But in the near-term, consultancies like Accenture and McKinsey, are positioning themselves to provide big analytics via billable hours.

Outside of consulting, firms with analytical strengths push upward, surfacing focused products or services to achieve success.

Strata Conference New York 2011, being held Sept. 22-23, covers the latest and best tools and technologies for data science -- from gathering, cleaning, analyzing, and storing data to communicating data intelligence effectively.

Save 20% on registration with the code STN11RAD

Focused services

The top of the big data stack is where data products and services directly touch consumers and businesses. For data startups, these offerings more frequently take the form of a service, offered as an API rather than a bundle of bits.

BillGuard is a great example of a startup offering a focused data service. It monitors customers' credit card statements for dubious charges, and even leverages the collective behavior of users to improve its fraud predictions.

Several startups are working on algorithms that can crack the content relevance nut, including Flipboard and News.me. Klout delivers a pure data service that uses social media activity to measure online influence. My company, Metamarkets, crunches server logs to provide pricing analytics for publishers.

For data startups, data processes and algorithms define their competitive advantage. Poor predictions — whether of fraud, relevance, influence, or price — will sink a data startup, no matter how well-designed their web UI or mobile application.

Focused data services aren't limited to startups: LinkedIn's People You May Know and FourSquare's Explore feature enhance engagement of their companies' core products, but only when they correctly suggest people and places.

Democratizing big data

The axes of strategy in the big data stack show analytics to be squarely at the center. Data platform providers are pushing upwards into analytics to differentiate themselves, touting support for fast, distributed code execution close to the data. Traditional analytics players, such as SAS and SAP, are expanding their storage footprints and challenging the need for alternative data platforms as staging areas. Finally, data startups and many established firms are creating services whose success hinges directly on proprietary analytics algorithms.

The emergence of data startups highlights the democratizing consequences of a maturing big data stack. For the first time, companies can successfully build offerings without deep infrastructure know-how and focus at a higher level, developing analytics and services. By all indications, this is a democratic force that promises to unleash a wave of innovation in the coming decade.



Related:




PlanetMySQL Voting: Vote UP / Vote DOWN

IA Ventures — Jobs shout out

Август 4th, 2011

My friends over at IA Ventures are looking both for an Analyst and for an Associate to their team.  If Big Data, New York and start-ups is in your blood then I can’t think of a better VC to be involved in. 

From the IA blog:

"IA Ventures funds early-stage Big Data companies creating competitive advantage through data and we’re looking for two start-up junkies to join our team – one full-time associate / community manager and one full time analyst. Because there are only four of us (we’re a start-up ourselves, in fact), we’ll need you to help us investigate companies, learn about industries, develop investment theses, perform internal operations, organize community events, and work with portfolio companies—basically, you can take on as much responsibility as you can handle."

Roger, Brad and the team continue to impress with their focus on Big Data, their strategic investments in monetizing data and knowledge of the industry in general.


PlanetMySQL Voting: Vote UP / Vote DOWN

Realtime Data Pipelines

Август 1st, 2011

In life there are really two major types of data analytics.  Firstly, we don’t know what we want to know – so we need analytics to tell us what is interesting.  This is broadly called discovery.  Secondly, we already know what we want to know – we just need analytics to tell us this information, often repeatedly and as quickly as possible.  This is called anything from reporting or dashboarding through more general data transformation and so on.

Typically we are using the same techniques to achieve this.  We shove lots of data into a repository of some from (SQL, MPP SQL, NoSQL, HDFS etc) then run queries/ jobs/ processes across that data to retrieve the information we care about.  

Now this makes sense for data discovery.  If we don’t know what we want to know, having lots of data in a big pile that we can slice and dice in interesting ways is good.   But when we already know what we want to know, continued batch based processing across mounds of data to produce “updated” results of data, that is often changing in constantly, can be highly inefficient.

Enter Realtime Data Pipelines.  Data is fed in one end, results are computed in real time as data flows down the pipeline and come out the other end whenever relevant changes we care about occur.  Data Pipelines / workflow / streams are becoming much more relevant for processing massive amounts of data with real time results.  Moving relevant forms of analytics out of large repositories into the actual data flow from producer to consumer, I believe, will be a fundamental step forward in big data management.

There are some emerging technologies looking to address this, more details to follow.

 


PlanetMySQL Voting: Vote UP / Vote DOWN

What Scales Best?

Июль 29th, 2011

It is a constant, yet interesting debate in the world of big data.  What scales best?  OldSQL, NoSQL, NewSQL?

I have a longer post coming on this soon.  But for now, let me make the following comments.  Generally, most data technologies can be made to scale - somehow.  Scaling up tends not to be too much of an issue, scaling out is where the difficulties begin.  Yet, most data technologies can be scaled in one form or another to meet a data challenge even if the result isn’t pretty. 

What is best?  Well that comes down to the resulting complexity, cost, performance and other trade-offs.  Trade-offs are key as there are almost always significant concessions to be made as you scale up.

A recent example of mine, I was looking at scalability aspects of MySQL.  In particular, MySQL Cluster.  It is actually pretty easy to make it scale.  A 5 node cluster on AWS was able to scale to process a sustained transaction rate of 371,000 insert transactions – per second.   Good scalability yes, but there were many trade-offs made around availability, recoverability and non-insert query performance to achieve it.  But for the particular requirement I was looking at, it fitted very well.

So what is this all about?  Well, if a Social Network is  running MySQL in a sharded cluster to achieve the scale necessary to support their multi-millions users the fact that database technology x or database technology y can also scale with different “costs” or trade-offs doesn’t necessarily make it any better – for them.  If you, for example, have some of the smartest and talented MySQL developers on your team and can alter the code at a moment’s notice to meet a new requirement – that alone might make your choice of MySQL “better’ than using NoSQL database xyz from a proprietary vender where there may be a loss of flexibility and control from soup to nuts.

So what is my point?  Well I guess what I am saying is physical scalability is of course an important consideration in determining what is best.  But it is only one side of the coin.  What it “costs” you in terms of complexity, actual dollars, performance, flexibility, availability, consistency etc, etc are all important too.  And these are often relative, what is complex for you may not be complex for someone else.

 


PlanetMySQL Voting: Vote UP / Vote DOWN

Intellectual property gone mad

Июль 18th, 2011

Friday night, I tweeted a link to a Guardian article stating that app developers were withdrawing apps from Apple's app store and Google's Android market (and presumably also Amazon's app store), because they feared becoming victims of a patent trolling lawsuit. That tweet elicited some interesting responses that I'd like to discuss.

The insurance solution?

One option might be to rely on the insurance industry to solve the problem. "Isn't this what insurance is supposed to be for? Couldn't all these developers set up a fund for their common defense?" wrote @qckbrnfx. An interesting idea, and one I've considered. But that's a cure that seems worse than the disease. First, it's not likely to be a cure. How many insurance companies actually defend their clients against an unreasonable lawsuit? They typically don't. They settle out of court and your insurance premium goes up.

@mikeloukides Isn't this what insurance is supposed to be for? Couldn't all these developers set up a fund for their common defense?less than a minute ago via Tweetbot for iPhone Favorite Retweet ReplyQ.B. Fox, Esq.
qckbrnfx

If you look at medical malpractice insurance, where unfounded malpractice claims are the equivalent to trolling, I would bet that the willingness of insurance companies to settle out of court increases trolling. An insurance solution to the problem of trolling would be, effectively, a tax on the software developers. And we would soon be in a situation where insurance companies were specifying who could develop software (after a couple of malpractice cases, a doctor becomes uninsurable, and he's effectively out of the business, regardless of the merits of those cases), what software they could develop, and so on. Percy Shelley once said that "poets are the unacknowledged legislators of the world." But my more cynical variation is that the insurance companies are the world's unacknowledged legislators. I don't want to see the software industry dancing to the insurance industry's tune. Some fear big government. I fear big insurance much more.

Fighting back?

There's a variant of the insurance solution that I like: @patentbuzz said: "Developers need to unite and crowdfund re-exam of obnoxious troll patents. Teach them a lesson." This isn't "insurance" in the classic risk-spreading sense: this is going on the offensive, and pooling funds to defend against trolling. I do not think it would take a lot of effort to make trolling (at least, the sort of low-level trolling that we're looking at here) unprofitable, and as soon as it becomes unprofitable, it will stop. Small-time app developers can't afford lawyers, which is precisely why trolling is so dangerous. But here's the secret: most patent trolls can't afford lawyers, either. They can afford enough lawyering to write a few cease and desist letters, and to settle out of court, but their funds would be exhausted fairly quickly if even a small percentage of their victims tried to fight back.

@mikeloukides Developers need to unite and crowdfund reexam of obnoxious troll patents. Teach them a lesson http://t.co/8wFkyFQless than a minute ago via web Favorite Retweet ReplyMark Nowotarski
patentbuzz

This is precisely where the big players need to get into the game. Apple has tried to give their app developers some legal cover, but as far as I know, they have not stepped in to pay for anyone's defense. Neither has Google. It's time for Apple and Google to step up to the plate. I am willing to bet that, if Apple or Google set up a defense fund, trolling would stop really quickly.

Blocking sale of patents?

A large part of the patent problem is that patents are transferable. @_philjohn asks "Do you think changing law to prevent transfer of patents could reduce the patent troll problem?" On one level, this is an attractive solution. But I'm wary: not about patent reform in itself (which is absolutely necessary), but because I've worked for a startup that went out of business. They had a small intellectual property portfolio, and the sale of that portfolio paid for my (substantial) unused vacation time. That's not how things are supposed to happen, but when startups go out of business, they don't always shut down nicely. It's worth asking what the cost would be if patents and other kinds of intellectual property were non-transferable. Would venture capitalists be less likely to invest, would startups fail sooner, if it were impossible to sell intellectual property assets? I suspect not, but it isn't a simple question.


A call to action

Patent and copyright law in the U.S. derives from the Constitution, and it's for a specific purpose: "To promote the progress of science and useful arts" (Article I, section 8). If app developers are being driven out of the U.S. market by patent controlling, patent law is failing in its constitutional goal; indeed, it's forcing "science and the useful arts" to take place elsewhere. That's a problem that needs to be addressed, particularly at a time when the software industry is one of the few thriving areas of the U.S. economy, and when startups (and in my book, that includes independent developers) drive most of the potential for job growth in the economy.

I don't see any relief coming from the patent system as it currently exists. The bigger question is whether software should be patentable at all. As Nat Torkington (@gnat) has reported, New Zealand's Parliament has a bill before it that will ban software patents, despite the lobbying of software giants in the U.S. and elsewhere. Still, at this point, significant changes to U.S. patent law belong in the realm of pleasant fantasy. Much as I would like to see it happen, I can't imagine Congress standing up to an onslaught of lobbyists paid by some of the largest corporations in the U.S.

One dimension of the problem is relatively simple: too many patent applications, too few patent office staff reviewing those applications, and not enough technical expertise on that staff to evaluate the applications properly. It doesn't help that patents are typically written to be as vague and broad as possible, without being completely meaningless. (As the staff tech writer at that startup, I had a hand in reviewing some of my former employer's patent applications). So you frequently can't tell what was actually patented, and an alleged "infringement" can take place that had little to do with the original invention. Tim O'Reilly (@timoreilly) suggested a return to the days when a patent application had to include the actual invention (for software, that could mean source code) being patented. This would reduce much of the ambiguity in what was actually patented, and might prevent some kinds of abuse. Whatever form it takes, better scrutiny on the part of the patent office would be a big help. But is that conceivable in these days of government spending cuts and debt ceilings? Larger filing fees, to support the cost of more rigorous examination, is probably a non-starter, given the current allergy to anything that looks like a "tax." However, inadequate review of patent applications effectively imposes a much larger (and unproductive) tax on the small developers who can least afford it.

If we can't rely on the patent office to do a better job of reviewing patents, the task falls to the Apples and Googles of the world — the deep-pocketed players who rely on small developers — to get into the game and defend their ecosystems. But though that's a nice idea, there are many reasons to believe it will never happen, not the least of which is that the big players are too busy suing each other.

Apple and Google, are you listening? Your communities are at stake. Now's the time to show whether you really care about your developers.

Crowdfunding the defense of small developers may be the best solution for the immediate problem. Is this a viable Kickstarter project? It probably would be the largest project Kickstarter has ever attempted. Would a coalition of patent attorneys be willing to be underpaid while they contribute to the public good? I'd be excited to see such a project start. This could also be a project for the EFF. The EFF has the expertise, they list "innovation" and "fair use" among their causes, and they talk explicitly about trolling on their intellectual property page. But they've typically involved themselves in a smaller number of relatively high-profile cases. Are they willing to step in on a larger (or smaller, as the case may be) scale?

None of these solutions addresses the larger problems with patents and other forms of intellectual property, but perhaps we're better off with baby steps. Even the baby steps aren't simple, but it's time to start taking them.

Android Open, being held October 9-11 in San Francisco, is a big-tent meeting ground for app and game developers, carriers, chip manufacturers, content creators, OEMs, researchers, entrepreneurs, VCs, and business leaders.

Save 20% on registration with the code AN11RAD




Related:


Intellectual property gone mad

Июль 18th, 2011

Friday night, I tweeted a link to a Guardian article stating that app developers were withdrawing apps from Apple's app store and Google's Android market (and presumably also Amazon's app store), because they feared becoming victims of a patent trolling lawsuit. That tweet elicited some interesting responses that I'd like to discuss.

The insurance solution?

One option might be to rely on the insurance industry to solve the problem. "Isn't this what insurance is supposed to be for? Couldn't all these developers set up a fund for their common defense?" wrote @qckbrnfx. An interesting idea, and one I've considered. But that's a cure that seems worse than the disease. First, it's not likely to be a cure. How many insurance companies actually defend their clients against an unreasonable lawsuit? They typically don't. They settle out of court and your insurance premium goes up.

@mikeloukides Isn't this what insurance is supposed to be for? Couldn't all these developers set up a fund for their common defense?less than a minute ago via Tweetbot for iPhone Favorite Retweet ReplyQ.B. Fox, Esq.
qckbrnfx

If you look at medical malpractice insurance, where unfounded malpractice claims are the equivalent to trolling, I would bet that the willingness of insurance companies to settle out of court increases trolling. An insurance solution to the problem of trolling would be, effectively, a tax on the software developers. And we would soon be in a situation where insurance companies were specifying who could develop software (after a couple of malpractice cases, a doctor becomes uninsurable, and he's effectively out of the business, regardless of the merits of those cases), what software they could develop, and so on. Percy Shelley once said that "poets are the unacknowledged legislators of the world." But my more cynical variation is that the insurance companies are the world's unacknowledged legislators. I don't want to see the software industry dancing to the insurance industry's tune. Some fear big government. I fear big insurance much more.

Fighting back?

There's a variant of the insurance solution that I like: @patentbuzz said: "Developers need to unite and crowdfund re-exam of obnoxious troll patents. Teach them a lesson." This isn't "insurance" in the classic risk-spreading sense: this is going on the offensive, and pooling funds to defend against trolling. I do not think it would take a lot of effort to make trolling (at least, the sort of low-level trolling that we're looking at here) unprofitable, and as soon as it becomes unprofitable, it will stop. Small-time app developers can't afford lawyers, which is precisely why trolling is so dangerous. But here's the secret: most patent trolls can't afford lawyers, either. They can afford enough lawyering to write a few cease and desist letters, and to settle out of court, but their funds would be exhausted fairly quickly if even a small percentage of their victims tried to fight back.

@mikeloukides Developers need to unite and crowdfund reexam of obnoxious troll patents. Teach them a lesson http://t.co/8wFkyFQless than a minute ago via web Favorite Retweet ReplyMark Nowotarski
patentbuzz

This is precisely where the big players need to get into the game. Apple has tried to give their app developers some legal cover, but as far as I know, they have not stepped in to pay for anyone's defense. Neither has Google. It's time for Apple and Google to step up to the plate. I am willing to bet that, if Apple or Google set up a defense fund, trolling would stop really quickly.

Blocking sale of patents?

A large part of the patent problem is that patents are transferable. @_philjohn asks "Do you think changing law to prevent transfer of patents could reduce the patent troll problem?" On one level, this is an attractive solution. But I'm wary: not about patent reform in itself (which is absolutely necessary), but because I've worked for a startup that went out of business. They had a small intellectual property portfolio, and the sale of that portfolio paid for my (substantial) unused vacation time. That's not how things are supposed to happen, but when startups go out of business, they don't always shut down nicely. It's worth asking what the cost would be if patents and other kinds of intellectual property were non-transferable. Would venture capitalists be less likely to invest, would startups fail sooner, if it were impossible to sell intellectual property assets? I suspect not, but it isn't a simple question.


A call to action

Patent and copyright law in the U.S. derives from the Constitution, and it's for a specific purpose: "To promote the progress of science and useful arts" (Article I, section 8). If app developers are being driven out of the U.S. market by patent controlling, patent law is failing in its constitutional goal; indeed, it's forcing "science and the useful arts" to take place elsewhere. That's a problem that needs to be addressed, particularly at a time when the software industry is one of the few thriving areas of the U.S. economy, and when startups (and in my book, that includes independent developers) drive most of the potential for job growth in the economy.

I don't see any relief coming from the patent system as it currently exists. The bigger question is whether software should be patentable at all. As Nat Torkington (@gnat) has reported, New Zealand's Parliament has a bill before it that will ban software patents, despite the lobbying of software giants in the U.S. and elsewhere. Still, at this point, significant changes to U.S. patent law belong in the realm of pleasant fantasy. Much as I would like to see it happen, I can't imagine Congress standing up to an onslaught of lobbyists paid by some of the largest corporations in the U.S.

One dimension of the problem is relatively simple: too many patent applications, too few patent office staff reviewing those applications, and not enough technical expertise on that staff to evaluate the applications properly. It doesn't help that patents are typically written to be as vague and broad as possible, without being completely meaningless. (As the staff tech writer at that startup, I had a hand in reviewing some of my former employer's patent applications). So you frequently can't tell what was actually patented, and an alleged "infringement" can take place that had little to do with the original invention. Tim O'Reilly (@timoreilly) suggested a return to the days when a patent application had to include the actual invention (for software, that could mean source code) being patented. This would reduce much of the ambiguity in what was actually patented, and might prevent some kinds of abuse. Whatever form it takes, better scrutiny on the part of the patent office would be a big help. But is that conceivable in these days of government spending cuts and debt ceilings? Larger filing fees, to support the cost of more rigorous examination, is probably a non-starter, given the current allergy to anything that looks like a "tax." However, inadequate review of patent applications effectively imposes a much larger (and unproductive) tax on the small developers who can least afford it.

If we can't rely on the patent office to do a better job of reviewing patents, the task falls to the Apples and Googles of the world — the deep-pocketed players who rely on small developers — to get into the game and defend their ecosystems. But though that's a nice idea, there are many reasons to believe it will never happen, not the least of which is that the big players are too busy suing each other.

Apple and Google, are you listening? Your communities are at stake. Now's the time to show whether you really care about your developers.

Crowdfunding the defense of small developers may be the best solution for the immediate problem. Is this a viable Kickstarter project? It probably would be the largest project Kickstarter has ever attempted. Would a coalition of patent attorneys be willing to be underpaid while they contribute to the public good? I'd be excited to see such a project start. This could also be a project for the EFF. The EFF has the expertise, they list "innovation" and "fair use" among their causes, and they talk explicitly about trolling on their intellectual property page. But they've typically involved themselves in a smaller number of relatively high-profile cases. Are they willing to step in on a larger (or smaller, as the case may be) scale?

None of these solutions addresses the larger problems with patents and other forms of intellectual property, but perhaps we're better off with baby steps. Even the baby steps aren't simple, but it's time to start taking them.

Android Open, being held October 9-11 in San Francisco, is a big-tent meeting ground for app and game developers, carriers, chip manufacturers, content creators, OEMs, researchers, entrepreneurs, VCs, and business leaders.

Save 20% on registration with the code AN11RAD




Related:


Who/What to acquire next

Март 18th, 2011

Well as predicted, with Aster Data recently being picked up by Teradata most of the key new generation MPP distributed analytics vendors have been acquired (Aster Data, Vertica, Netezza & Greenplum).  This had to happen and was expected to happen.  The MPP Analytics startup “revolution” is over and these technologies will now be integrated into the mainstream.

So what’s next?  As we now, if you are a massive multi-national software company it is a lot less risky to incrementally innovate and leave the development of “game changing” technologies to startups that can be acquired after they prove both the tech and the market.  So what follows MPP?

NoSQL technologies seem the only likely candidate at the moment, although I think it is a few years too early for any major acquisitions to occur.  A key issue that would need to be worked through is what exactly is being acquired as most NoSQL platforms are open source / free (most MPP platforms were proprietary).  But nonetheless, as the market grows and starts to eat away at some noticeable level from the existing RDBMS market the major vendors will want a piece of that action and the frenzy will start again.  But this is still quite a while away yet.

 


PlanetMySQL Voting: Vote UP / Vote DOWN