polegone's comments

polegone · on May 9, 2015

I am a novice, and that is one of the reasons why I started this project (to learn).

microtonal · on May 9, 2015

    bytes, _ := ioutil.ReadAll(os.Stdin)
    lines := strings.Split(string(bytes), "\n")

Tip: use bufio.Scanner.

    scanner := bufio.NewScanner(reader)
    scanner.Split(bufio.ScanLines)
    for scanner.Scan() {
      // Do stuff
    }

And you can iterate over the lines 'lazily'. If you don't want to consume \r's, make your own ScanLines :).

Someone · on May 9, 2015

Tail also doesn't always need to read its entire input. When given a file path, it can read backwards from the end of the file until it has found its n lines. That makes tail fast on huge files (as long as they have normal line lengths), but it also complicates the code.

And you may want to mmap the file, rather than open it. Whether that is a speeds up things depends on OS, OS version, file size, file system, available memory, phase of the moon, etc.

on May 9, 2015

[deleted]

microtonal · on May 9, 2015

You mean bufio.ReadLine? As the documentation says, it's low-level, since you have to do the buffer allocation yourself. ReadBytes/ReadString are nicer interface-wise, but it allocates new buffers on every call.

I like Scanner because it provides a nice high-level interface, but still maintains an internal buffer, reducing GC pressure. Of course, it's not one size fits all.

polegone · on May 9, 2015

I tested it by on a several megabyte text file (not that large) and I can see a huge improvement in speed when I use a reader vs. loading the whole thing into memory as I did at first.

I can see now how much of a difference it can make on a really large file, like in the gigabyte range.

BTW, I deleted my earlier comment because the problem I had wasn't anything to do with bufio. I had just made an obvious mistake elsewhere in my code, which I've fixed now.

andrewchambers · on May 9, 2015

The problem is something like tail should not read the whole file into memory. tail works on a 100 gig file even with 100 megs of ram.

LaFolle · on May 9, 2015

Great going. This will also help other novice Go programmers learning the language, at the same time getting a sense of how to implement their own Unix commands.

polegone · on May 9, 2015

I must say that yours is much more complete than mine, though.

microtonal · on May 9, 2015

Work together!

It would be great if complete coreutils was implemented!

barsonme · on May 9, 2015

I'd love some help. I'm juggling 3 side projects right now, and it's hard making time for all of them. :)

polegone · on May 9, 2015

I linked it in the readme so all those people visiting my project will see yours.

I can't help much myself (you are way ahead of me), but more people should visit your project now.

barsonme · on May 9, 2015

Thank you! That's very generous of you.

polegone · on May 9, 2015

No problem.

barsonme · on May 9, 2015

I'd love if our projects could somehow work together. My contact info is in my account's description, so shoot me a message!

polegone · on May 9, 2015

Thank you. I will look into this.

preetamjinka · on May 9, 2015

No problem. Also, please do not ignore errors. They're meant to be handled.

polegone · on May 9, 2015

I'm working on that as well. I just fixed cat to crash and log on error, and I'll be fixing the other commands soon.

By the way, I'm wondering how I should go through the file line by line with a reader.

I think the most efficient way may be to scan byte by byte from the start (head) or the end (tail), and count until reaching n amount of newlines (or stop if at the end of the file), then print the bytes between the start/end and the nth newline.

How does this sound?

preetamjinka · on May 9, 2015

Good question. I'm not sure. You might want to seek to the end and move back. You probably shouldn't do it byte-by-byte directly from the file since that's very inefficient. As you can tell, this is already starting to get complicated! Maybe you could try mmaping the file so you could treat it as a []byte.

laumars · on May 9, 2015

Last year I was trying to write a Go routine that read a file backwards. I was amazed how unexpectedly difficult that proved to be.

In the end I settled for reading it from the start which worked 99.999% of the time and enabled me to finish the project to the tight deadline I had. But I've always meant to go back and "fix" that code at some point.

sridca · on May 9, 2015

> By the way, I'm wondering how I should go through the file line by line with a reader.

Take a look at my Go package that allows you to programmatically do 'tail -f' - https://github.com/activestate/tail

iagooar · on May 9, 2015

The strategy that is used by the original GNU coreutils written in C, and the one I used to implement tail with Rust, is to jump to the end of the file, than rewind AVERAGE_CHARS_PER_LINE * NUMBER_OF_LINES_TO_BE_READ, check if enough lines have been read, and repeat until enough lines have been found.

I found the optimal value of AVERAGE_CHARS_PER_LINE to be around 40 characters, but of course it hugely depends on the file being read.

polegone · on May 9, 2015

I'm just doing this for fun and to learn about Go and Unix at the same time.

I am a beginner, and a lot of my code is inefficient and/or incomplete, but by putting it on the Internet I can get criticism and find out where I went wrong.

For instance, some of you have told me that the way I've been reading files is very inefficient, so now I'll try and do it the correct way.

tantalic · on May 9, 2015

That is a great reason to do this and the best way to learn.

nekopa · on May 9, 2015

That is the best reason. I am glad HN has people like you who are not afraid to put yourself out there and reminds us all that this is what being a hacker is: doing stuff for fun and learning.