I’m running tests of a variable-size-and-contents test set. Data files are added and removed frequently. I’m looking for an automated way of gathering a file list.
All files are in subdirectories of D; I need the full directory and name added to a text file. However, I only need those files that have a “paired” file, which has the same filename but a different extension (so… different filename, but in a structured way). So, if there is a MyFileName.A and MyFileName.B, then I want D/.../MyFileName added to the file list.
There are .A files without .B files, but no .B files without .A files. If a .A has a .B file, then both files are in the same directory.
Any advice?
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
If none of the filenames contain any newlines, you can do:
find D -type f ( -name '*.A' -o -name '*.B' ) | sed 's/.[^.]*$//' | sort | uniq -d >paired_files
This should work in the more general case where there are .B files without .A files.
To handle any filename using recent GNU tools:
find D -type f ( -name '*.A' -o -name '*.B' ) -print0 | sed -z 's/.[^.]*$//' | sort -z | uniq -dz | tr '' 'n' >paired_files
Method 2
If you statement is true “there are no .B files without .A files.”, then get a list of .B files and remove the extension.
find $directory-to-search -name "*.B" | sed -r -e "s~(.*).B~1~g"
Method 3
With zsh:
print -rl mydir/**/*.A(.e_'REPLY=$REPLY:r; [[ -f $REPLY.B ]]'_)
:r removes the extension, so if the content of $REPLY was mydir/somedir/somefile.A after running REPLY=$REPLY:r its content becomes mydir/somedir/somefile;
the rest is similar to this answer.
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0