• Complain

Pete Warden - Data Source Handbook

Here you can read online Pete Warden - Data Source Handbook full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2011, publisher: OReilly Media, genre: Computer. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Pete Warden Data Source Handbook
  • Book:
    Data Source Handbook
  • Author:
  • Publisher:
    OReilly Media
  • Genre:
  • Year:
    2011
  • Rating:
    3 / 5
  • Favourites:
    Add to favourites
  • Your mark:
    • 60
    • 1
    • 2
    • 3
    • 4
    • 5

Data Source Handbook: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "Data Source Handbook" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

If youre a developer looking to supplement your own data tools and services, this concise book covers the most useful sources of public data available today. Youll find useful information on APIs that offer broad coverage, tie their data to the outside world, and are either accessible online or feature downloadable bulk data. Youll also find code and helpful links.

This guide organizes APIs by the subjects they coversuch as websites, people, or placesso you can quickly locate the best resources for augmenting the data you handle in your own service. Categories include:

  • Website tools such as WHOIS, bit.ly, and Compete
  • Services that use email addresses as search terms, including Github
  • Finding information from just a name, with APIs such as WhitePages
  • Services, such as Klout, for locating people with Facebook and Twitter accounts
  • Search APIs, including BOSS and Wikipedia
  • Geographical data sources, including SimpleGeo and U.S. Census
  • Company information APIs, such as CrunchBase and ZoomInfo
  • APIs that list IP addresses, such as MaxMind
  • Services that list books, films, music, and products

Pete Warden: author's other books


Who wrote Data Source Handbook? Find out the surname, the name of the author of the book and a list of all author's works by series.

Data Source Handbook — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "Data Source Handbook" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Data Source Handbook
Pete Warden
Published by OReilly Media

Beijing Cambridge Farnham Kln Sebastopol Tokyo SPECIAL OFFER Upgrade this - photo 1

Beijing Cambridge Farnham Kln Sebastopol Tokyo

SPECIAL OFFER: Upgrade this ebook with OReilly

for more information on this offer!

Please note that upgrade offers are not available from sample content.

Preface

A lot of new sources of free, public data have emerged over the last few years, and this guide covers some of the most useful. Its aimed at developers looking for information to supplement their own tools or services. There are obviously a lot of APIs out there, so to narrow it down to the most useful, the ones in this guide have to meet these standards:

Free or self-service signup

Traditional commercial data agreements are designed for enterprise companies, so theyre very costly and time-consuming to experiment with. APIs that are either free or have a simple sign-up process make it a lot easier to get started.

Broad coverage

Quite a few startups build infrastructure and then hope that users will populate it with data. Most of the time, this doesnt happen, so you end up with APIs that look promising on the surface but actually contain very little useful data.

Online API or downloadable bulk data

Most of us now develop in the web world, so anything else requires a complex installation process that makes it much harder to try out.

Linked to outside entities

There has to be some way to look up information that ties the services data to the outside world. For example, the Twitter and Facebook APIs dont qualify because you can only find users by internal identifiers, whereas LinkedIn does because you can look up accounts by their real-world names and locations.

I also avoid services that impose excessive conditions on what you can do with the information they provide. There are some on the border of acceptability there, so for them Ive highlighted any special restrictions on how you can use the data, along with links to the full terms of service.

The APIs are organized by the subject that they cover (for example, websites, people, or places), so you can discover the best sources to augment your data. Please get in touch () if you know of services that are missing, or have other questions or suggestions.

Chapter 1. Data Source Handbook
Websites
WHOIS

The whois Unix command is still a workhorse, and Ive found the web service a decent alternative, too. You can get the basic registration information for any website. In recent years, some owners have chosen private registration, which hides their details from view, but in many cases youll see a name, address, email, and phone number for the person who registered the site. You can also enter numerical IP addresses here and get data on the organization or individual that owns that server.

Unfortunately the terms of service of most providers forbid automated gathering and processing of this information, but you can craft links to the Domain Tools site to make it easy for your users to access the information:

Info for www.google.com

There is a commercial API available through whoisxmlapi.com that offers a JSON interface and bulk downloads, which seems to contradict the terms mentioned in most WHOIS results. It costs $15 per thousand queries. Be careful, though; it requires you to send your password as a nonsecure URL parameter, so dont use a valuable one:

curl "http://www.whoisxmlapi.com/whoisserver/WhoisService?\domainName=oreilly.com&outputFormat=json&userName=&password="{"WhoisRecord": { "createdDate": "26-May-97", "updatedDate": "26-May-10", "expiresDate": "25-May-11", "registrant": { "city": "Sebastopol", "state": "California", "postalCode": "95472", "country": "United States", "rawText": "O'Reilly Media, Inc.\u000a1005 Gravenstein Highway North \u000aSebastopol, California 95472\u000aUnited States\u000a", "unparsable": "O'Reilly Media, Inc.\u000a1005 Gravenstein Highway North" }, "administrativeContact": { "city": "Sebastopol",...
Blekko

The newest search engine in town, Blekko sells itself on the richness of the data it offers. If you type in a domain name followed by /seo, youll receive a page of statistics on that URL ().

Figure 1-1 Blekko statistics Blekko is also very keen on developers accessing - photo 2

Figure 1-1. Blekko statistics

Blekko is also very keen on developers accessing its data, so it offers an easy-to-use API through the /json slash tag, which returns a JSON object instead of HTML:

http://blekko.com/?q=cure+for+headaches+/json+/ps=100&auth=&ft=&p=1

To obtain an API key, email , and while theyre somewhat restrictive, they are flexible in practice:

You should note that it prohibits practically all interesting uses of the blekko API. We are not currently issuing formal written authorization to do things prohibited in the agreement, but, if you are well behaved (e.g., not flooding us with queries), and we know your email address (from when you applied for an API auth key, see above), we will have the ability to attempt to contact you and discuss your usage patterns if needed.

Currently, the /seo results arent available through the JSON interface, so you have to scrape the HTML to obtain them. Theres a demonstration of that at https://github.com/petewarden/pagerankgraph.

bit.ly

The bit.ly API lets you access analytics information for a URL thats been shortened. If youre starting off with a full URL, youll need to call the lookup function to obtain the short URL. You can sign up for API access here. This is most useful if you want to gauge the popularity of a site, either so you can sort and filter links youre displaying to a user or to feed into your own analysis algorithms:

curl "http://api.bit.ly/v3/clicks?login=&apiKey=&\shortUrl=http://bit.ly/hnB7HI"{"status_code": 200, "data": { "clicks": [{ "short_url": "http://bit.ly/hnB7HI", "global_hash": "gKGd7s", "user_clicks": 9, "user_hash": "hnB7HI", "global_clicks": 36}]}, "status_txt": "OK"}
Compete

The Compete API gives a very limited amount of information on domains, a trust rating, a ranking for how much traffic a site receives, and any online coupons associated with the site. Unfortunately, you dont get the full traffic history information that powers the popular graphs on the web interface. The terms of service also rate-limit you to 1,000 calls a day, and you cant retain any record of the information you pull, which limits its usefulness:

curl "http://api.compete.com/fast-cgi/MI?d=google.com&ver=3&apikey=&size=large" google.com green http://toolbar.compete.com/trustgreen/google.com ... 1 http://toolbar.compete.com/siteprofile/google.com ......
Delicious

Despite its uncertain future, the Delicious service collects some of the most useful information on URLs Ive found. The API returns the top 10 tags for any URL, together with a count of how many times each tag has been used ().

Figure 1-2 Delicious tags You dont need a key to use the API and it supports - photo 3

Figure 1-2. Delicious tags

You dont need a key to use the API, and it supports JSONP callbacks, allowing you to access it even within completely browser-based applications. Heres some PHP sample code on github, but the short version is you call to http://feeds.delicious.com/v2/json/urlinfo/data?hash= with the MD5 hash of the URL appended, and you get back a JSON string containing the tags:

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «Data Source Handbook»

Look at similar books to Data Source Handbook. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «Data Source Handbook»

Discussion, reviews of the book Data Source Handbook and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.