I have two different files and I want to compare theirs contents line by line, and write their common contents in a different file. Note that both of them contain some blank spaces.
Here is my pseudo code:
file1 = open('some_file_1.txt', 'r')
file2 = open('some_file_2.txt', 'r')
FO = open('some_output_file.txt', 'w')
for line1 in file1:
for line2 in file2:
if line1 == line2:
FO.write("%sn" %(line1))
FO.close()
file1.close()
file2.close()
However, by doing this, I got lots of blank spaces in my FO file. Seems like common blank spaces are also written. I want to write only the text part. Can somebody please help me.
For example: my first file (file1) contains data:
Config: Hostname = TUVALU BT: TS_Ball_Update_Threshold = 0.2 BT: TS_Player_Search_Radius = 4 BT: Ball_Template_Update = 0
while second file (file2) contains data:
Pole_ID = 2 Width = 1280 Height = 1024 Color_Mode = 0 Sensor_Scale = 1 Tracking_ROI_Size = 4 Ball_Template_Update = 0
If you notice, last two lines of each files are the same, hence, I want to write this file in my FO file. But, the problem with my approach is that, it writes the common blank space also. Should I use regex for this problem? I do not have experience with regex.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:
with open('some_file_1.txt', 'r') as file1:
with open('some_file_2.txt', 'r') as file2:
same = set(file1).intersection(file2)
same.discard('n')
with open('some_output_file.txt', 'w') as file_out:
for line in same:
file_out.write(line)
Method 2
Yet another example…
from __future__ import print_function #Only for Python2
with open('file1.txt') as f1, open('file2.txt') as f2, open('outfile.txt', 'w') as outfile:
for line1, line2 in zip(f1, f2):
if line1 == line2:
print(line1, end='', file=outfile)
And if you want to eliminate common blank lines, just change the if statement to:
if line1.strip() and line1 == line2:
.strip() removes all leading and trailing whitespace, so if that’s all that’s on a line, it will become an empty string "", which is considered false.
Method 3
If you are specifically looking for getting the difference between two files, then this might help:
with open('first_file', 'r') as file1:
with open('second_file', 'r') as file2:
difference = set(file1).difference(file2)
difference.discard('n')
with open('diff.txt', 'w') as file_out:
for line in difference:
file_out.write(line)
Method 4
If order is preserved between files you might also prefer difflib. Although Robᵩ’s result is the bona-fide standard for intersections you might actually be looking for a rough diff-like:
from difflib import Differ
with open('cfg1.txt') as f1, open('cfg2.txt') as f2:
differ = Differ()
for line in differ.compare(f1.readlines(), f2.readlines()):
if line.startswith(" "):
print(line[2:], end="")
That said, this has a different behaviour to what you asked for (order is important) even though in this instance the same output is produced.
Method 5
Once the file object is iterated, it is exausted.
>>> f = open('1.txt', 'w')
>>> f.write('1n2n3n')
>>> f.close()
>>> f = open('1.txt', 'r')
>>> for line in f: print line
...
1
2
3
# exausted, another iteration does not produce anything.
>>> for line in f: print line
...
>>>
Use file.seek (or close/open the file) to rewind the file:
>>> f.seek(0) >>> for line in f: print line ... 1 2 3
Method 6
Try this:
from __future__ import with_statement
filename1 = "G:\test1.TXT"
filename2 = "G:\test2.TXT"
with open(filename1) as f1:
with open(filename2) as f2:
file1list = f1.read().splitlines()
file2list = f2.read().splitlines()
list1length = len(file1list)
list2length = len(file2list)
if list1length == list2length:
for index in range(len(file1list)):
if file1list[index] == file2list[index]:
print file1list[index] + "==" + file2list[index]
else:
print file1list[index] + "!=" + file2list[index]+" Not-Equel"
else:
print "difference inthe size of the file and number of lines"
Method 7
I have just been faced with the same challenge, but I thought “Why programming this in Python if you can solve it with a simple “grep”?, which led to the following Python code:
import subprocess
from subprocess import PIPE
try:
output1, errors1 = subprocess.Popen(["c:\cygwin\bin\grep", "-Fvf" ,"c:\file1.txt", "c:\file2.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
output2, errors2 = subprocess.Popen(["c:\cygwin\bin\grep", "-Fvf" ,"c:\file2.txt", "c:\file1.txt"], shell=True, stdout=PIPE, stderr=PIPE).communicate();
if (len(output1) + len(output2) + len(errors1) + len(errors2) > 0):
print ("Compare result : There are differences:");
if (len(output1) + len(output2) > 0):
print (" Output differences : ");
print (output1);
print (output2);
if (len(errors1) + len(errors2) > 0):
print (" Errors : ");
print (errors1);
print (errors2);
else:
print ("Compare result : Both files are equal");
except Exception as ex:
print("Compare result : Exception during comparison");
print(ex);
raise;
The trick behind this is the following:
grep -Fvf file1.txt file2.txt verifies if all entries in file2.txt are present in file1.txt. By doing this in both directions we can see if the content of both files are “equal”. I put “equal” between quotes because duplicate lines are disregarded in this way of working.
Obviously, this is just an example: you can replace grep by any commandline file comparison tool.
Method 8
difflib is well worth the effort, with nice condensed output.
from pathlib import Path
import difflib
mypath = '/Users/x/lib/python3'
file17c = Path(mypath, 'oop17c.py')
file18c = Path(mypath, 'oop18c.py')
with open(file17c) as file_1:
file1 = file_1.readlines()
with open(file18c) as file_2:
file2 = file_2.readlines()
for line in difflib.unified_diff(
file1, file2, fromfile=str(file17c), tofile=str(file18c), lineterm=''):
print(line)
output
+ … unique stuff present in file18c
– … stuff absent in file18c but present in file17c
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0