Sync Android App Preferences with Google Drive

More recently, Google Drive released a feature called "Application Data Folder" to allow developers to store application related data on Google Drive. Application Data Folder, or appdata, gives every application a hidden folder on user's Drive to store application related data you don't want user to be able to view or modify. This allows applications to store user preferences or other well-formed data that's relevant to the application on Google Drive without risking external user manipulations.

Before this year's I/O, we thought it would a great idea idea to demonstrate the capabilities of the appdata to the Android developers. Since every Android phone is already signed in with a Google account, it'll be a fairly good experience for Android users to sync with Google Drive. That's how I came up with an application preferences syncer that automatically syncs a SharedPreferences key-value store with Google Drive. Your preferences are now automatically getting distributed to multiple devices!

AppdataPreferences for Android SDK is now available on Github. Let me illustrate how it works. Firstly, let your user to pick a Google Account for you to initialize a GoogleAccountCredential with appdata scope:

GoogleAccountCredential credential =
    GoogleAccountCredential.usingOAuth2(this, "https://www.googleapis.com/auth/drive.appdata");
credential.setSelectedAccountName(googleAccountName);

Bind a SharedPreferences key-value store to the user's account:

SharedPreferences preferences = getSharedPreferences("preferences", MODE_PRIVATE);
AppdataPreferences preferences = AppdataPreferences.get(getApplicationContext());
syncer.bind(googleAccountCredential, sharedPreferences);

Go and edit the preferences as you used to do:

 preferences.edit()
    .putString("name", "burcu")
    .putBoolean("favoriteColor", "#cccccc")
    .commit();

Your changes will be synced with a preferences on your appdata folder on Google Drive. You will be able to access these preferences from other devices. In a multi-device senario, we also allow you to listen to the changes by subscribing with an OnChangeListener:

syncer.setOnChangeListener(new OnChangeListener() {
  @Override
  public void onChange(SharedPreferences prefs) {
    // preferences are changed, maybe refresh the screen
    // maybe alter the logic
  }
});

You can force-sync or preload with:

syncer.sync();

That's it! The following repos are open sourced on Google Drive's Github organization, I highly encourage you to take a look and give it a try.

Google+ Sign In for Express

We've recently published a node.js client library to let the node community to talk to Google APIs in a more pleasant way. As a quick starter demo, I've implemented a middleware for Connect that helps you to easily add Google+ Sign In to your Connect-powered projects (such as express web apps) with a few lines of code.

The sample middleware is called plussignin and available on burcu/node-plussignin. Assume you have an existing express project or creating a new one. There are two additional steps to enable Google+ sign-in:

  • Add express.cookieParser, express.session and plussignin middlewares to your app.
  • Configure plussignin with your client ID, client secret, redirect URI and required scopes. (Client ID and secret are available on API Console.

plussignin will add the following routes to your application and will handle OAuth 2.0 flow for your app automagically.

  • /login: Redirects user to the authorization dialog and ask for confirmation.
  • /logout: Will remove the user and his/her profile from the session, logs the user out.
  • /pluscallback: When a user grants permissions to your app on the auth dialog, this end-point will be hooked. plussign will exchange tokens with Google OAuth 2.0 endpoints to retrieve an access token. Once an access token is acquired, it will make a request to retrieve user's profile. Once this flow is executed successfully, it will put the user and user's profile into the session and redirect the logged-in user to the homepage.
  • /error: If an error occurs during OAuth 2.0 flow, user will be redirected to /error.

The following snippet illustrates a sample usage.

var CLIENT_ID = 'YOUR_CLIENT_ID_HERE',
    CLIENT_SECRET = 'YOUR_CLIENT_SECRET_HERE',
    REDIRECT_URI = 'http://localhost:3000/pluscallback',
    SCOPES = [
      'https://www.googleapis.com/auth/plus.login'];
app.use(express.cookieParser('something secret'));
app.use(express.session({ secret: 'yet another secret', store: new MemoryStore() }));
app.use(plussignin({ clientId: CLIENT_ID, clientSecret: CLIENT_SECRET, redirectUri: REDIRECT_URI, scopes: SCOPES }));

// renders the homepage
app.get('/', function(req, res) {
  res.render('index', { plus: req.plus });
});

app.listen(3000);
console.log('Im listening you on port 3000...');

Some more good news: req objects are extended with several utilities.

  • req.plus.isLoggedIn is true, if there is a user in the session.
  • req.plus.oauth2 is a googleapis.OAuth2Client.
  • req.plus.profile is user profile object.
  • req.plus.people.get({ userId: '' }); returns a regular googleapis Request object.

Note: Many asked why this is not a module. Answer: It's not prod ready. I'm willing to clean it up, provide some other essential features, and release it as a module.

Cluster-based Recommendation with Mahout

Mahout includes a few new experimental recommenders that are weakly documented at the moment. One of them is TreeClusteringRecommender which clusters your model into a set of groups and makes recommendations based on distances between your users and items in these clusters.

A clustered-based recommendation may be a good choice if your data is sparse and rarely correlated to find obvious patterns. Another advantage is clustering may help you to provide recommendations to users even with very tiny data available. Yet, it decreases the level of personalization and output is not unique to users but to clusters.

Here is a quick start to create and run one:

UserSimilarity similarity = new LogLikelihoodSimilarity(model);
ClusterSimilarity clusterSimilarity = new FarthestNeighborClusterSimilarity(similarity);
Recommender rec = TreeClusteringRecommender(model, clusterSimilarity, 20);

rec.getCluster(1); // gets the cluster of userId=1
rec.recommend(1, 10); // recommends 10 items to userId=1

What about user similarity and cluster similarity?

  • You basically have to provide a similarity function to be able to measure distances between different users. You may like to represent each user with a vector and calculate euclidean, Pearson, cosine, etc.  distance by looking at these features. Or LogLlikelihoodSimilarity may work as well. You may want to look at Surprise and Coincidence to understand what's under the hood of this similarity.
  • Cluster similarity is newly represented here. It's a place to customize the measurement of the similarity between two clusters. There are already two implementations: NearestNeighborClusterSimilarity and FarthestNeighborClusterSimilarity. Beware that clusters are dynamic. As new data comes in, old clusters may be merged and new ones may be introduced.

And the rest is about experimental work to find a method fit for your data by analyzing the nature of your input,  plotting the results and evaluating. The initial clustering takes a lot of time compared to item or user similarity based recommenders of Mahout, yet it works to be OK online once you start working with pre-computed data. Even it's not very mature, for a primitive start, you may still like to consider this recommender if what you want to achieve fits in clustering.

The Rise of the Open Science

Open science is opening the way we make science. It stands for transparency and public accessibility of scientific data, collaboration, methods and results. On the other hand, it supports the existence of public contribution to the current state of science, and giving it back to the public domain.

Motivation

While we are making science, we rely on the older publications and methods those are often published with no open access to data. Years ago, academic community skeptically started to question the credibility of the research work on the existing literature. The way that science is funded was one of chief reasons behind this question. Science made with non-open data had possibility to be easily led by politics and other funding authority such as private companies to mislead the facts such as global warming or medical side-effects of a new medicine. Firing up an openness discussion led another ideas such as opening the methods and scientific source code.

Why to open data, open tools and open results?

One of the core values of science was being open and accessible. But ironically science is today receive heavy financial support from private institutions and governments where much of the budgets are shaped by economical, industrial or military needs. Scientific institutions are mostly closed to people without PhDs for scientist roles because there is already a huge competition among PhDs. Our credibility is measured by the number of papers published and number of citations we receive. I wouldn't want to slander scientists but professional science, as in its own closed ecosystem, has a few conflicts against the key foundations of science. Science's route, subject, people and results are controlled or may have possibility of being controlled by authority. In next few decades, we have to reissue the way we sustain  science.

We also do have a verification problem with science that relies on data. Computational and statistical science is lacking in reproducing the final results advertised on publications. JASA (Journal of the American Statistical Association) reports that only 21% of the papers are being published with source open in 2011, still a positive number compared to 2006's 9%. Without code or data, even the work is published on an academic journal, there is no way to validate or iterate over the existing founding.

One of the key problems as we can address is that scientific research is not maintainable without economical sustainability due to the need of scientific tools. I've watched Eri Gentry, the founder of BioCurious, at OSCON last year. Her key points about opening the scientific tools, in the self-makers' vision was motivating. According to her, at some point at BioCurious, they needed to have a PCR machine that was costing several thousand dollars to keep their garage based research on. Since they can't afford the machine, they decided to analyze how they are actually working. Fortunately, they've figured out what it's about and created OpenPCR. And now you are able to copy some strawberry DNA sequence or make cancer research at home. An open repository of knowledge on making scientific tools will increase the level of collaboration from regular makers and DIY people who may never have chance to investigate or be able to reverse engineer these tools.

Collaborative Science

By the radical changes in means of communication, discovery and discussion will have to change radically as well. A few months ago, I've seen a book by Michael Nielsen called Reinventing Discovery: The New Era of Networked Science on the new arrivals section. Nielsen opens the first chapter by a 2009 story about Tim Gowers' Polymath Project. Tim Gowers is a very notable mathematician, a Fields medalist from Cambridge University. In 2009, instead of working alone or with his existing pairs, he decided to discuss a mathematical problem on his blog and asked for readers to share their ideas online. In 6 weeks, he received 800 comments from 27 people. Although start has a its pitfalls, 37 days later Gowers announced they have not just solved his problem but the generalization of the polymaths problem including a special case.

And what about citizen science? Citizen science is used to be perceived as a more pro way of scientific crowd sourcing. But this perception seems to be changing. Very recently, I had a few discussions with friends who are totally aliens for citizen science and its current initiatives. They preliminary questioned the need of citizen scientists. Our main talk was about classification of galaxies on GalaxyZoo. GalaxyZoo is an online tool that shows you images of galaxies taken by Hubble telescope and wants you to manually choose if galaxy is elliptical or spiral or it has some set of features or not. Any programmer would initially ask why we are doing this classification manually in 2010s. Honestly, we have technology to pick up the features directly from signal without any observation from a human eye. So? But, discovery is not classification. We actually don't know what we are looking at. Any anomalies or any strange looking objects would be a new scientific discovery. By reviewing the existing images, GalaxyZoo members discovered a new type of galaxies, now we call them "pea galaxies" and Hanny van Arkel, a Dutch school teacher, discovered a green strange nebula-looking object in the size of the Milky Way Galaxy called Hanny's Voorwerp again in 2007.

So, why aren't we taking it any further? There is an ongoing afford to make a cultural shift to increase the awareness and participation into science. Not only Zooniverse projects but NASA has opened code.nasa.gov very recently. Ariel Waldman is keeping a dictionary of all citizen space exploration projects on spacehack.org for a while. LHC's ongoing CMS project donated data to Science Hack Day participants to let data hackers come up with data visualization tools for CMS. DIYgenomics are crowd sourcing genomic data. The list goes on...

Conclusions

With the ongoing momentum in scientific communities, in the next few decades, we'll experience a tremendous change in they way we make and participate in science. For now, not intercepting conventional means but creating possibilities, new science is approaching with the strong sympathy for making scientific results freely and universally accessible.