• Complain

Thomas Mailund [Thomas Mailund] - The Joys of Hashing: Hash Table Programming with C

Here you can read online Thomas Mailund [Thomas Mailund] - The Joys of Hashing: Hash Table Programming with C full text of the book (entire story) in english for free. Download pdf and epub, get meaning, cover and reviews about this ebook. year: 2019, publisher: Apress, genre: Business. Description of the work, (preface) as well as reviews are available. Best literature library LitArk.com created for fans of good reading and offers a wide selection of genres:

Romance novel Science fiction Adventure Detective Science History Home and family Prose Art Politics Computer Non-fiction Religion Business Children Humor

Choose a favorite category and find really read worthwhile books. Enjoy immersion in the world of imagination, feel the emotions of the characters or learn something new for yourself, make an fascinating discovery.

Thomas Mailund [Thomas Mailund] The Joys of Hashing: Hash Table Programming with C

The Joys of Hashing: Hash Table Programming with C: summary, description and annotation

We offer to read an annotation, description, summary or preface (depends on what the author of the book "The Joys of Hashing: Hash Table Programming with C" wrote himself). If you haven't found the necessary information about the book — write in the comments, we will try to find it.

Build working implementations of hash tables, written in the C programming language. This book starts with simple first attempts devoid of collision resolution strategies, and moves through improvements and extensions illustrating different design ideas and approaches, followed by experiments to validate the choices.
Hash tables, when implemented and used appropriately, are exceptionally efficient data structures for representing sets and lookup tables, providing low overhead, constant time, insertion, deletion, and lookup operations.
The Joys of Hashing walks you through the implementation of efficient hash tables and the pros and cons of different design choices when building tables. The source code used in the book is available on GitHub for your re-use and experiments.
What You Will Learn
  • Master the basic ideas behind hash tables
  • Carry out collision resolution, including strategies for handling collisions and their consequences for performance
  • Resize or grow and shrink tables as needed
  • Store values by handling when values must be stored with keys to make general sets and maps
Who This Book Is For
Those with at least some prior programming experience, especially in C programming.

Thomas Mailund [Thomas Mailund]: author's other books


Who wrote The Joys of Hashing: Hash Table Programming with C? Find out the surname, the name of the author of the book and a list of all author's works by series.

The Joys of Hashing: Hash Table Programming with C — read online for free the complete book (whole text) full work

Below is the text of the book, divided by pages. System saving the place of the last page read, allows you to conveniently read the book "The Joys of Hashing: Hash Table Programming with C" online for free, without having to search again every time where you left off. Put a bookmark, and you can go to the page where you finished reading at any time.

Light

Font size:

Reset

Interval:

Bookmark:

Make
Thomas Mailund 2019
Thomas Mailund The Joys of Hashing https://doi.org/10.1007/978-1-4842-4066-3_1
1. The Joys of Hashing
Thomas Mailund
(1)
Aarhus N, Denmark

This book is an introduction to the hash table data structure. When implemented and used appropriately, hash tables are exceptionally efficient data structures for representing sets and lookup tables. They provide constant time, low overhead, insertion, deletion, and lookup. The book assumes the reader is familiar with programming and the C programming language. For the theoretical parts of the book, it also assumes some familiarity with probability theory and algorithmic theory.

Hash tables are constructed from two basic ideas: reducing application keys to a hash key, a number in the range from 0 to some N 1, and mapping that number into a smaller range from 0 to m 1, mN. We can use the small range to index into an array with constant time access. Both ideas are simple, but how they are implemented in practice affects the efficiency of hash tables.

Consider Figure . This figure illustrates the main components of storing values in a hash table: application values, which are potentially complex, are mapped to hash keys, which are integer values in a range of size N, usually zero to N 1. In the figure, N = 64. Doing this simplifies the representation of the values; now you only have integers as keys, and if N is small, you can store them in an array of size N. You use their hash keys as their index into the array. However, if N is large, this is not feasible. If, for example, the space of hash keys is 32-bit integers, then N = 4, 294, 967, 295, slightly more than four billion. An array of bytes of this size would, therefore, take up more than four gigabytes of space. To be able to store pointers or integers, simple objects, you would need between four and eight times as much memory. It is impractical to use this size of an array to store some application keys.
Figure 1-1 Values map to hash keys that then map to table bins Even if N is - photo 1
Figure 1-1

Values map to hash keys that then map to table bins

Even if N is considerably smaller than four-byte words, if you plan to store nN keys, you waste a lot of space to have the array. Since this array needs to be allocated and initialized, merely creating it will cost you O(N) . Even if you get constant time insertion and deletion into such an array, the cost of producing it can easily swamp the time your algorithm will spend while using the array. If you want a table that is efficient, you should be able to both initialize it and use it to insert or delete n keys, all in time O(n). Therefore, N should be in O(n).

The typical solution to this is to keep N large but have a second step that reduces the hash key range down to a smaller bin range of size m with mO(n); in the example, you use m = 8. If you keep m small, as in O(n), you can allocate and initialize it in linear time, and you can get any bin in it in constant time. To insert, check, or delete an element in the table, you map the application value to its hash key and then map the hash key to a bin index.

You reduce values to bin indices in two steps because the first step, mapping data from your application domain to a number, is program-specific and cannot be part of a general hash table implementation. Moving from large integer intervals to smaller, however, can be implemented as part of the hash table. If you resize the table to adapt it to the number of keys you store in it, you need to change m. You do not want the application programmer to provide separate functions for each m. You can think of the hash key space, [N] = [0, ... , N 1], as the interface between the application and the data structure. The hash table itself can map from this space to indices in an array, [m] = [0, ... , m 1].

The primary responsibility of the first step is to reduce potentially complicated application values to simpler hash keys , such as to map application-relevant information like positions on a board game or connections in a network down to integers. These integers can then be handled by the hash table data structure. A second responsibility of the function is to make the hash keys uniformly distributed in the range [N]. The binning strategy for mapping hash keys to bins assumes that the hash keys are uniformly distributed to distribute keys into bins evenly. If this is violated, the data structure does not guarantee (expected) constant time operations. Here, you can add a third, middle step that maps from [N] [N] and scrambles the application hash keys to hash keys with a better distribution; see Figure . Having a middle step does not eliminate the need for application hash functions. You still need to map complex data into integers. The middle step only alleviates the need for an even distribution of keys. The map from application keys to hash keys still has some responsibility for this, though. If it maps different data to the same hash keys, then the middle step cannot do anything but map the same input to the same output.
Figure 1-2 If the application maps values to keys but they are not uniformly - photo 2
Figure 1-2

If the application maps values to keys, but they are not uniformly distributed, then a hashing step between the application and the binning can be added

Strictly speaking, you do not need the distribution of hash keys to be uniform as long as the likelihood of two different values mapping to the same key is highly unlikely. The goal is to have uniformly distributed hash keys, as these are easiest to work with when analyzing theoretical performance. The runtime results referred to in Chapter , you will learn techniques for achieving similar results without the assumption.

The book is primarily about implementing the hash table data structure and only secondarily about hash functions . The concerns when implementing hash tables are these: given hash keys with application values attached to them, how do you represent the data such that you can update and query tables in constant time? The fundamental idea is, of course, to reduce hash keys to bins and then use an array of bins containing values. In the purest form, you can store your data values directly in the array at the index the hash function and binning functions provide but if m is relatively small compared to the number of data values, then you are likely to have collisions: cases where two hash keys map to the same bin. Although different values are unlikely to hash to the same key in the range [N], this does not mean that collisions are unlikely in the range [m] if m is smaller than N (and as the number of keys you insert in the table, n, approaches m, collisions are guaranteed). Dealing with collisions is a crucial aspect of implementing hash tables , and a topic we will deal with for a sizeable part of this book.

Footnotes

In some textbooks, you will see the hashing step and the binning step combined and called hashing. Then you have a single function that maps application-specific keys directly to bins. I prefer to consider this as two or three separate functions, and it usually is implemented as such.

Next page
Light

Font size:

Reset

Interval:

Bookmark:

Make

Similar books «The Joys of Hashing: Hash Table Programming with C»

Look at similar books to The Joys of Hashing: Hash Table Programming with C. We have selected literature similar in name and meaning in the hope of providing readers with more options to find new, interesting, not yet read works.


Reviews about «The Joys of Hashing: Hash Table Programming with C»

Discussion, reviews of the book The Joys of Hashing: Hash Table Programming with C and just readers' own opinions. Leave your comments, write what you think about the work, its meaning or the main characters. Specify what exactly you liked and what you didn't like, and why you think so.