How can I remove duplicates in my .bash_history, preserving order?

August 9, 2022 by Magenaut

I really enjoying using control+r to recursively search my command history. I’ve found a few good options I like to use with it:

# ignore duplicate commands, ignore commands starting with a space
export HISTCONTROL=erasedups:ignorespace

# keep the last 5000 entries
export HISTSIZE=5000

# append to the history instead of overwriting (good for multiple connections)
shopt -s histappend

The only problem for me is that erasedups only erases sequential duplicates – so that with this string of commands:

ls
cd ~
ls

The ls command will actually be recorded twice. I’ve thought about periodically running w/ cron:

cat .bash_history | sort | uniq > temp.txt
mv temp.txt .bash_history

This would achieve removing the duplicates, but unfortunately the order would not be preserved. If I don’t sort the file first I don’t believe uniq can work properly.

How can I remove duplicates in my .bash_history, preserving order?

Contents hide

Sorting the history

Managing .bash_history

Extra Credit:

Are there any problems with overwriting the .bash_history file via a script? For example, if you remove an apache log file I think you need to send a nohup / reset signal with kill to have it flush it’s connection to the file. If that is the case with the .bash_history file, perhaps I could somehow use ps to check and make sure there are no connected sessions before the filtering script is run?

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

So I was looking for the same exact thing after being annoyed by duplicates, and found that if I edit my ~/.bash_profile or my ~/.bashrc with:

export HISTCONTROL=ignoreboth:erasedups

It does exactly what you wanted, it only keeps the latest of any command. ignoreboth is actually just like doing ignorespace:ignoredups and that along with erasedups gets the job done.

At least on my Mac terminal with bash this work perfect.
Found it here on askubuntu.com.

Method 2

Sorting the history

This command works like sort|uniq, but keeps the lines in place

nl|sort -k 2|uniq -f 1|sort -n|cut -f 2

Basically, prepends to each line its number. After sort|uniq-ing, all lines are sorted back according to their original order (using the line number field) and the line number field is removed from the lines.

This solution has the flaw that it is undefined which representative of a class of equal lines will make it in the output and therefore its position in the final output is undefined. However, if the latest representative should be chosen you can sort the input by a second key:

nl|sort -k2 -k 1,1nr|uniq -f1|sort -n|cut -f2

Managing .bash_history

For re-reading and writing back the history, you can use history -a and history -w respectively.

Method 3

Found this solution in the wild and tested:

awk '!x[$0]++'

The first time a specific value of a line ($0) is seen, the value of x[$0] is zero.
The value of zero is inverted with ! and becomes one.
An statement that evaluates to one causes the default action, which is print.

Therefore, the first time an specific $0 is seen, it is printed.

Every next time (the repeats) the value of x[$0] has been incrented,
its negated value is zero, and a statement that evaluates to zero doesn’t print.

To keep the last repeated value, reverse the history and use the same awk:

awk '!x[$0]++' ~/.bash_history                 # keep the first value repeated.

tac ~/.bash_history | awk '!x[$0]++' | tac     # keep the last.

Method 4

Extending Clayton answer:

tac $HISTFILE | awk '!x[$0]++' | tac | sponge $HISTFILE

tac reverse the file, make sure you have installed moreutils so you have sponge available, otherwise use a temp file.

Method 5

This is an old post, but a perpetual issue for users who want to have multiple terminals open, and have the history synched between windows, but not duplicated.

My solution in .bashrc:

shopt -s histappend
export HISTCONTROL=ignoreboth:erasedups
export PROMPT_COMMAND="history -n; history -w; history -c; history -r"
tac "$HISTFILE" | awk '!x[$0]++' > /tmp/tmpfile  &&
                tac /tmp/tmpfile > "$HISTFILE"
rm /tmp/tmpfile

histappend option adds the history of the buffer to the end of the history file ($HISTFILE)
ignoreboth and erasedups prevent duplicate entries from being saved in the $HISTFILE
The prompt command updates the history cache
- history -n reads all lines from $HISTFILE that may have occurred in a different terminal since the last carriage return
- history -w writes the updated buffer to $HISTFILE
- history -c wipes the buffer so no duplication occurs
- history -r re-reads the $HISTFILE, appending to the now blank buffer
the awk script stores the first occurrence of each line it encounters. tac reverses it, and then reverses it back so that it can be saved with the most recent commands still most recent in the history
rm the /tmp file

Every time you open a new shell, the history has all dupes wiped,
and every time you hit the Enter key
in a different shell/terminal window,
it updates this history from the file.

Method 6

These would keep the last duplicated lines:

ruby -i -e 'puts readlines.reverse.uniq.reverse' ~/.bash_history
tac ~/.bash_history | awk '!a[$0]++' | tac > t; mv t ~/.bash_history

Method 7

Almost every answer in this does not take into account history files with: timestamps, or multi-line history entries.

I needed a way to merge my memory and disk history when my shell session exits, (from multiple terminals), or just merge histories from one terminal to another.

I looked for a long time but could not find anything that did it in a way I considered correct. So I eventually DIY’ed a solution…

Here is my solution… Merge the on-disk “.bash_history” with the in-memory shell ‘history’. Preserving timestamp ordering, and command order within those timestamps.

Optionally removing non-unique commands (even if multi-line), and/or removing (cleaning out) simple and/or sensitive commands, according to defined perl RE’s. Adjust to suit!

This is the result… https://antofthy.gitlab.io/software/history_merge.bash.txt

You can customise it as you like, or make it a bash function if you want. Or adjust the commands that it ‘cleans’ from the history..

I run this either on demand using an alias (like ‘hm’ for history merge) or when a shell logs out (from the “.bash_logout”), unless I disabled shell history (by unsetting “$HISTFILE” using a ‘hd’ alias)

Enjoy.

Method 8

I have timestamps on mine so most solutions to mess with the files dont work. I also have a directory for the history files to be specific per hosts. I used some of the things found here to remove duplicates and such from history before writing back to the history file but sometimes I have a few shells running on the same host which then keeps those duplicates in there. My solution to clean up the mess every now and then is to create an executable file with this in it:

#!/bin/sh

for file in ~/.bash_history/*
do
  tac "$file" | awk '!visited[$0]++' | tac | sed 'N;/^#.*n#.*/!P;D' > tempfile;
  mv tempfile "$file"
done

Save it and execute it.
Basically: reverse file and use awk to clean duplicates while keeping the last one, reverse again, then use sed to delete the consecutive timestamps while keeping the last one. Save file to tempfile, move tempfile to history file. My history directory went from 109M to 1008K 🙂

Method 9

To uniqely record every new command is tricky. First you need to add to
~/.profile or similar:

HISTCONTROL=erasedups
PROMPT_COMMAND='history -w'

Then you need to add to ~/.bash_logout:

history -a
history -w

Method 10

I’ve wrote a small program that lets you clean your bash/shell history, also retroactively and preserving its order:

https://gitlab.com/vn971/shell-history-cleaner

USAGE:
    shell-history-cleaner [OPTIONS] <TARGET_FILE>

ARGS:
    <TARGET_FILE>
            Target file to clean. You can use "$HISTFILE" to clean up the shell history.

OPTIONS:
    -d, --dedup
            De-duplicate lines to only keep one last occurrence of each dup. In contrast to bash
            built-in deduplication, this also works if the duplicates are sparse and do not
            immediately follow each other.

    -r, --remove <REMOVE>
            Lines to remove. For example, 'yt-dlp.*' will remove lines starting with 'yt-dlp'.
            Can be specified multiple times.
            
            The patterns are regular expressions, assuming the whole line is matched, as defined
            here: https://docs.rs/regex/latest/regex/#syntax
            
            Another real-life example:
            '(ps aux.*|git checkout .*|git branch .*| .*|yt-dlp .*|chmod .*|echo .*|man .*)'

    -h, --help
            Print help information

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating

Subscribe

Name*

Email*

Website

Name*

Email*

Website

0 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

0

Would love your thoughts, please comment.x

()