In previous posts, we’ve considered various aspects of book marketing from the point of view of the author or marketer. In this post, we’re going to shift gears slightly and talk about social media from a different perspective, that of the underlying data.

So before we proceed any further, consider this a Nerd Alert! We’re going to get a bit technical here. If talk of databases and search engines don’t excite you, now might be a good time to bail out.

For those of you still here (Hey, it’s everyone! Cool!), here’s a very high-level description of how we approach the challenges of Big Data as it pertains to the publishing world. Our goal is to match books to the social conversation occurring around those books as well as related books and topics. In order to do this, we keep our eye on the social web on an on-going basis, identifying people, places, and conversations where those books and topics are being discussed. We search across multiple channels, including Twitter, Goodreads, Facebook, Pinterest, and the blogosphere (not all of these channels are currently active in our application) and apply our homegrown algorithms to distinguish the most relevant matches from the least relevant.

What constitutes relevant social activity varies from channel to channel. For example, a conversation might be defined by a Twitter hashtag or a discussion thread on a blog. A place might be a Goodreads group or a Facebook author page – on-line sites where interested readers congregate, indicating their likes and (sometimes) dislikes.

Our mechanism for searching also varies from channel to channel. Our custom-developed tools access social data through a combination of traditional search engine technologies, public and private APIs, and licensed data sources.

Once we identify relevant social conversation, we store that information and perform an initial series of analytics to index it to help us retrieve it when we need it – that is, when we want to display that content to a user in one of our applications. (For the database geeks out there, we use a hybrid, cloud-based relational and graph database tier to optimize our storage and local queries.)

The process of searching, indexing, and storing data occurs at different frequencies, as appropriate for each channel. Twitter data changes quite rapidly – second by second, in fact! – so we keep an eye on the “firehose” of Tweets on a real-time basis. (It’s possible for a user to see a Tweet in the FMA application that was posted less than 10 seconds before.) For other social networks that fluctuate less rapidly, we may search and store content daily, weekly, or even monthly. In all cases, we try to stay as current with social conversation as is warranted by the channel in order to bring the most value to our user.

The net result of the process described above is a vast database of information about social network activity related to books and the written word, and a set of tools that can be employed both at fixed intervals and in “real-time” to search social networks for relevant conversations.

It’s a rich trove of information that constitutes the data foundation of our business and powers our user applications. We’ve always known that an author or book marketer should not be expected to have to deal with all that data, and we are more than happy to help in the process, hopefully cutting days to hours, hours to minutes, and minutes to seconds when it comes to finding interested readers on the social web.

Share this post!