get first X characters from the cat command?

I have a text file I’m outputting to a variable in my shell script. I only need the first 50 characters however.

I’ve tried using cat ${filename} cut -c1-50 but I’m getting far more than the first 50 characters? That may be due to cut looking for lines (not 100% sure), while this text file could be one long string– it really depends.

Is there a utility out there I can pipe into to get the first X characters from a cat command?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

head -c 50 file

This returns the first 50 bytes.

Mind that the command is not always implemented the same on all OS.
On Linux and macOS it behaves this way.
On Solaris (11) you need to use the gnu version in /usr/gnu/bin/

Method 2

Your cut command works if you use a pipe to pass data to it:

cat ${file} | cut -c1-50

Or, avoiding a useless use of cat and making it a little safer:

cut -c1-50 < "$file"

Note that the commands above will print the first 50 characters (or bytes, depending on your cut implementation) of each input line. It should do what you expect if, as you say, your file is one huge line.

Method 3

dd status=none bs=1 count=50 if=${filename}

This returns the first 50 bytes.

Method 4

Most answers so far assume that 1 byte = 1 character, which may not be the case if you are using a non-ASCII locale.

A slightly more robust way to do it:

testString=$(head -c 200 < "${filename}") &&
  printf '%sn' "${testString:0:50}"

Note that this assumes:

  1. You are using ksh93, bash (or a recent zsh or mksh (though the only multi-byte charset supported by mksh is UTF-8 and only after set -o utf8-mode)) and a version of head that supports -c (most do nowadays, but not strictly standard).
  2. The current locale is set to the same encoding as the file (type locale charmap and file -- "$filename" to check that); if not, set it with ie. LC_ALL=en_US.UTF-8)
  3. I took the first 200 bytes of the file with head, assuming the worst-case UTF-8 where all the characters are encoded on at most 4 bytes. This should cover most cases I can think of.

Method 5

grep -om1 "^.{50}" ${filename}

Other variant (for first line in file)

(IFS= read -r line <${filename}; echo ${line:0:50})

Method 6

1. For ASCII files, do like @DisplayName says:

head -c 50 file.txt

will print out the first 50 chars of file.txt, for example.

2. For binary data, use hexdump to print it out as hex chars:

hexdump -n 50 -v file.bin

will print out the first 50 bytes of file.bin, for example.

Note that without the -v verbose option, hexdump would replace repeated lines with an asterisk (*) instead. See here: https://superuser.com/questions/494245/what-does-an-asterisk-mean-in-hexdump-output/494613#494613.

Method 7

To read and output 50 characters (not bytes), with zsh, you can do:

read -eu0 -k50 < $file

If the input contains sequences of bytes that don’t form valid characters in the current locale, each of those bytes will be counted as one character.

  • -e: echoes what is read instead of storing it in a variable:
  • -k50: reads 50 characters. read -k was initially meant for reading key presses on the terminal (and would put the terminal in the correct mode to get one keypress at a time), but when used with -u<fd>, it reads characters from the corresponding file descriptor instead.
  • -u0 reads those characters from file descriptor 0 (stdin) which here we redirect from the file.

Method 8

You can use sed for this which will tackle the problem pretty easily

sed -E 's/^(.{0,50}).*/1/' yourfile

-E allows us to use Extended regular expressions, instead of basic regular expressions, so we don’t have to use backslashes to escape the more advanced regular expression operators.

s/x/y/ substitutes x with y in each line, where x is a regular expression and y is an expression which can contain literal values or references to capture groups.

^(.{0,50}) matches up to the first 50 characters of each line and marks it as a capture group.

.* matches the rest of the line (if there were more than 50 characters), since we want to replace the whole thing.

1 is a backreference referring to the first capture group.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
0
Would love your thoughts, please comment.x
()
x