Hacker Newsnew | past | comments | ask | show | jobs | submit | polegone's commentslogin

I am a novice, and that is one of the reasons why I started this project (to learn).


    bytes, _ := ioutil.ReadAll(os.Stdin)
    lines := strings.Split(string(bytes), "\n")
Tip: use bufio.Scanner.

    scanner := bufio.NewScanner(reader)
    scanner.Split(bufio.ScanLines)
    for scanner.Scan() {
      // Do stuff
    }
And you can iterate over the lines 'lazily'. If you don't want to consume \r's, make your own ScanLines :).


Tail also doesn't always need to read its entire input. When given a file path, it can read backwards from the end of the file until it has found its n lines. That makes tail fast on huge files (as long as they have normal line lengths), but it also complicates the code.

And you may want to mmap the file, rather than open it. Whether that is a speeds up things depends on OS, OS version, file size, file system, available memory, phase of the moon, etc.


[deleted]


You mean bufio.ReadLine? As the documentation says, it's low-level, since you have to do the buffer allocation yourself. ReadBytes/ReadString are nicer interface-wise, but it allocates new buffers on every call.

I like Scanner because it provides a nice high-level interface, but still maintains an internal buffer, reducing GC pressure. Of course, it's not one size fits all.


I tested it by on a several megabyte text file (not that large) and I can see a huge improvement in speed when I use a reader vs. loading the whole thing into memory as I did at first.

I can see now how much of a difference it can make on a really large file, like in the gigabyte range.

BTW, I deleted my earlier comment because the problem I had wasn't anything to do with bufio. I had just made an obvious mistake elsewhere in my code, which I've fixed now.


The problem is something like tail should not read the whole file into memory. tail works on a 100 gig file even with 100 megs of ram.


Great going. This will also help other novice Go programmers learning the language, at the same time getting a sense of how to implement their own Unix commands.


I must say that yours is much more complete than mine, though.


Work together!

It would be great if complete coreutils was implemented!


I'd love some help. I'm juggling 3 side projects right now, and it's hard making time for all of them. :)


I linked it in the readme so all those people visiting my project will see yours.

I can't help much myself (you are way ahead of me), but more people should visit your project now.


Thank you! That's very generous of you.


No problem.


I'd love if our projects could somehow work together. My contact info is in my account's description, so shoot me a message!


Thank you. I will look into this.


No problem. Also, please do not ignore errors. They're meant to be handled.


I'm working on that as well. I just fixed cat to crash and log on error, and I'll be fixing the other commands soon.

By the way, I'm wondering how I should go through the file line by line with a reader.

I think the most efficient way may be to scan byte by byte from the start (head) or the end (tail), and count until reaching n amount of newlines (or stop if at the end of the file), then print the bytes between the start/end and the nth newline.

How does this sound?


Good question. I'm not sure. You might want to seek to the end and move back. You probably shouldn't do it byte-by-byte directly from the file since that's very inefficient. As you can tell, this is already starting to get complicated! Maybe you could try mmaping the file so you could treat it as a []byte.


Last year I was trying to write a Go routine that read a file backwards. I was amazed how unexpectedly difficult that proved to be.

In the end I settled for reading it from the start which worked 99.999% of the time and enabled me to finish the project to the tight deadline I had. But I've always meant to go back and "fix" that code at some point.


> By the way, I'm wondering how I should go through the file line by line with a reader.

Take a look at my Go package that allows you to programmatically do 'tail -f' - https://github.com/activestate/tail


The strategy that is used by the original GNU coreutils written in C, and the one I used to implement tail with Rust, is to jump to the end of the file, than rewind AVERAGE_CHARS_PER_LINE * NUMBER_OF_LINES_TO_BE_READ, check if enough lines have been read, and repeat until enough lines have been found.

I found the optimal value of AVERAGE_CHARS_PER_LINE to be around 40 characters, but of course it hugely depends on the file being read.


I'm just doing this for fun and to learn about Go and Unix at the same time.

I am a beginner, and a lot of my code is inefficient and/or incomplete, but by putting it on the Internet I can get criticism and find out where I went wrong.

For instance, some of you have told me that the way I've been reading files is very inefficient, so now I'll try and do it the correct way.


That is a great reason to do this and the best way to learn.


That is the best reason. I am glad HN has people like you who are not afraid to put yourself out there and reminds us all that this is what being a hacker is: doing stuff for fun and learning.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: