join : “File 2 not in sorted order”

I’ve got two files _jeter3.txt and _jeter1.txt

I’ve checked they are both sorted on the 20th column using sort -c

sort -t '     ' -c -k20,20 _jeter3.txt
sort -t '     ' -c -k20,20 _jeter1.txt
#no errors

but there is an error when I want to join both files it says that the second file is not sorted:
join -t '   ' -1 20 -2 20 _jeter1.txt _jeter3.txt > /dev/null
join: File 2 is not in sorted order

I don’t understand why.
cat /etc/*-release #FYI
openSUSE 11.0 (i586)
VERSION = 11.0

UPDATE: using ‘sort -f‘ and join -i (both case insensitive) fixes the problem. But it doesn’t explain my initial problem.

UPDATE: versions of sort & join:

> join --version
join (GNU coreutils) 6.11
Copyright (C) 2008 Free Software Foundation, Inc.
(...)

> sort --version
sort (GNU coreutils) 6.11
Copyright (C) 2008 Free Software Foundation, Inc.
(...)

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

I got the same error with Ubuntu 11.04, with sort and join both in version (GNU coreutils) 8.5.

They are clearly incompatible. In fact the sort command seems bugged: there is no difference with or without the -f (--ignore-case) option. When sorting, aaB is always before aBa. Non alphanumeric characters seems also always ignored (abc is before ab-x)

Join seems to expect the opposite… But I have a solution

In fact, this is linked to the collation sequence: using LANG=en_EN sort -k 1,1 <myfile> ... then LANG=en_EN join ... eliminates the message.

Internationalisation is the root of evil… (nobody documents it clearly).

Method 2

sort by default uses the entire line as the key

join uses only the specified field as the key.

You must correct this incompatibility by restricting sort to use only the key you want to join on.

The Join man page states:

Important: FILE1 and FILE2 must be sorted on the join fields. E.g., use ‘sort -k 1b,1′ if >’join’ has no options. Note, comparisons honor the rules specified by ‘LC_COLLATE’. If the >input is not sorted and some lines cannot be joined, a warning message will be given.

Method 3

If you are sure you properly sorted your input files and their lines can be paired, you can avoid the above error by running join --nocheck-order file1.txt file2.txt

Method 4

Were you sorting with numbers? I found that zero-padding the column that I was joining on solved this issue for me.

cat file.txt 
     | awk -F"   " '{ $20=sprintf("%06s", $20); print $0}' 
     | sort > readytojoin.txt

Method 5

Note that if you see this error, and you have already sorted on a specific column and are beating your head against the wall e.g. sort -k4,4 then you may also need to set the separator for the sort command

Apparently OP already did this with -t ‘ ‘ but for a normal tab separated text I’d recommend

sort -t $'t' ...

The sort command can incorporate spaces as separators by default even on something that looks like a tab separated file (especially if there are spaces inside the column you are sorting on).

Then if you passed that sorted data to join, and you have

join -t $'t' ...

Then this ends up causing the error message about it being unsorted. As noted above, join may not accept -t ‘ ‘ though.

Method 6

LOCALE=C sort ...
LOCALE=C join ...

This will solve your problem. The issue, as pointed out by @Michael, is collation sequence, which depends on your LOCALE setting.

Method 7

For join the argument after -t is a character. For sort you can supply a longer sort separator. I think that you may be joining the files on a different field that you want to, and ignoring the case solves the problem by coincidence.

And I agreee with Gilles, that sample data would be helpful.


All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments