Data Munging with Perl
Home    View Cart    Contact Us

Search Books

Current Category
Books
   Computers & Internet

All Categories

Narrow by Category
Certification Central
Computer & Video Games
Computer Science
Databases
Digital Business & Culture
General
Graphics & Illustration
Hardware
Home & Office
Microsoft
Networking
Operating Systems
Programming
Software
Web Development


Data Munging with Perl


Shipping is just $2.95 for the first book and $1.00 for each additional book!  (For shipments in the US only, with media mail and paypal.)

All the books we sell are NEW, never read.  We may list a book as used if we feel it is not in gift condition due to minor wear on the cover from being on a retail store shelf.  We purchase all our books directly from the publisher. 

Data Munging with Perl
(Larger Image)

Data Munging with Perl

by David Cross
Product Group: Book
Publisher: Manning Publications (2001-01-15)
ISBN: 1930110006
EAN: 9781930110007
Dewey Decimal #: 005.72
Paperback: 300 pages
Edition: 1st
SKU: 080416024
Condition: New
Comments: 1930110006 New, other Blender, Microsoft, OReilly computer books available at great prices. May have slight wear on cover. New, never read, may have minor wear on cover.


Editorial Reviews


Product Description
The Perl language is well suited for use with "data munging" tasks: those that involve transforming and massaging data. While Perl is commonly used for such tasks, there has been no book focused on the topic of munging. This book covers the basic paradigms of programming and discusses the many techniques that are specific to Perl. It also examines standard data formats such as text, binary, HTML, and XML before giving tips on creating and parsing new structured data formats. Source code downloads and technical support from the authors are available on publisher's Web site.


Customer Reviews


No-nonsense resource for meat and potatoes Perl scripting
Rating (4)
Date: 2007-07-21

1 out of 1 customers found this reveiw helpful


The quintessential Perl activity is data processing, particularly in a Unix environment, where output is piped into a script from some other program, transformed, and spat out again. Many people's first encounter with Perl will probably be in this task. David Cross's book shows how to do this with the minimum of fuss and the maximum of flexibility. It's not a Perl tutorial however, so you will need some basic knowledge of Perl, having read The Llama is enough. There is an appendix of 'essential Perl' to refresh your memory if you're a bit rusty.

The book begins by revising some of those basic Perl practices that come in handy for scripting, e.g. command line options, regular expressions and sorting. The second part of the book deals with parsing fairly simple data: traditional fixed-width record data (e.g. the column-based stuff that you often find as the output of old Fortran and C programs), unstructured data (e.g. doing word counts on text files), and formats such as CSV, PNG and MP3. This is the strongest section of the book, and contains lots of useful hands-on information.

The third part of the book deals with more modern forms of data files, in the shape of XML. Parsing HTML also gets a chapter to itself, after the author usefully demonstrates the limitations of any simple solution (e.g. using regexes), which provides pretty strong evidence in favour of the standard 'don't try it yourself, use a CPAN module' argument. The XML chapter itself covers the XML::Parser module in reasonable detail. However, there are now many more XML parsers in Perl out there, and XML::Parser is probably no longer the best solution (Grant McClean's Perl XML FAQ on the net has a good overview of the options). Excluding the seemingly obligatory 'here's a bunch of books and websites to learn more' chapter, the last proper chapter is on parsing, and the Rec::Descent module, and it's a very good gentle introduction.

If you're not working in a command line environment, there's not a whole lot here you're going to need. Equally, if you've been doing this sort of thing for a while, there's not much here that will be new to you, not all the subjects are explored in any great depth. And some of it (particularly the XML chapter) is a bit outdated and superficial, so I would knock off a star from my rating if you're more interested in the XML/HTML chapters.

But for the simpler tasks, e.g. parsing column based data, this is recommended. You're shown all the handy tricks you need such as piping, taking input from standard in as well as files, slurping paragraphs etc. My 4-star rating applies if this sounds like what you need: it's a clear, short and to-the-point book, which is definitely taking with you on your first journey into data munging.


I wish I had purchased this book years ago
Rating (5)
Date: 2007-01-01

1 out of 1 customers found this reveiw helpful


As a DBA, I bought this book to enhance my data manipulation skills with Perl but I found so much more in this compact book. David Cross provides many excellent code examples and explanations for common, non-database data manipulation tasks. For example: working on delimited and fixed-width text files and managing complex data structures in perl with array and hash refs. David has excellent communications skills as his examples and explanations taught me much about Perl that I did not previously understand completely. I also found the Chapter 4 on regular expressions to be one of the best and most concise. The only downside of this book is that I wish it had more pages to read! Regardless, it's a must-have perl book.


Belongs on every sysadmin's desk
Rating (5)
Date: 2002-07-02


This book isn't about arcane corners of Perl theory. It's about how to write Perl programs that perform the "simple" task of converting data from one format to another.

Need to get every headline from an RSS feed? Or report the three users with the most processes running, as listed by `ps`? Or extract the first paragraph from each of a thousand HTML files? Or make a .tsv file based on all the "From:" and "Subject:" lines in your mailbox file? If those sorts of tasks sound familiar to you, then this is the book you've been looking for. It has working code for doing these sorts of things, involving lots of different common kinds of formats.

By tech book standards, this book is short (300 pages), but it's clear and direct and to the point -- no bloat here. Every page tells you something you need to know, with useful examples for every idea that it explains.


Valuable for its _clarity_
Rating (5)
Date: 2001-07-25


After reading this book I rewrote a pretty massive postscript pasrsing and munging system that I was having a lot of trouble with and felt like I did it the _right_ way. If you follow the author through his examples and actually read the book (which I was able to read almost straight through) I think that you will find yourself with a more long-view approach. And I think that makes this book valuable. And admit it, every time you read throgh a regex chapter you get a little more in the old noggin...


Good for data-processing *beginners*
Rating (4)
Date: 2001-07-06

17 out of 17 customers found this reveiw helpful


It's a guide. David takes you through the different "data munging" tasks ( record oriented data ? binary data ? fixed-width data ? XML ? ) and shows you his proper ways of dealing with them ( or, at least, thinking about them ). It's not an encyclopedia of "data munging", the book is 300 pages and many of them ( too many, may be ) are detailed descriptions of useful CPAN modules ( which I wasn't reading as careful as the rest of the book, since POD was always enough ), so it covers only a usual data processing tasks letting you to go deeper by yourself for more advanced topics. After you'll finish it much less "data sources" will scare you - the solutions and references are inside.

As I said, it may be good for data-processing beginners, but Perl experts will hardly find lot's of new information in it.

P.S. I trust him and therefore follow his advices in every script I start to think of ( especially the one about "UNIX filter model" ).

Retail Price: $36.95
Our Price:$13.89
That's 62% Off!