Remove lines that contain certain string

I’m trying to read a text from a text file, read lines, delete lines that contain specific string (in this case ‘bad’ and ‘naughty’).
The code I wrote goes like this:

infile = file('./oldfile.txt')

newopen = open('./newfile.txt', 'w')
for line in infile :

    if 'bad' in line:
        line = line.replace('.' , '')
    if 'naughty' in line:
        line = line.replace('.', '')
    else:
        newopen.write(line)

newopen.close()

I wrote like this but it doesn’t work out.

One thing important is, if the content of the text was like this:

good baby
bad boy
good boy
normal boy

I don’t want the output to have empty lines.
so not like:

good baby

good boy
normal boy

but like this:

good baby
good boy
normal boy

What should I edit from my code on the above?

Contents hide

Answers:

Method 1

Method 2

Method 3

Method 4

Method 5

Method 6

Method 7

Method 8

Method 9

Method 10

Answers:

Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.

Method 1

You can make your code simpler and more readable like this

bad_words = ['bad', 'naughty']

with open('oldfile.txt') as oldfile, open('newfile.txt', 'w') as newfile:
    for line in oldfile:
        if not any(bad_word in line for bad_word in bad_words):
            newfile.write(line)

using a Context Manager and any.

Method 2

You could simply not include the line into the new file instead of doing replace.

for line in infile :
     if 'bad' not in line and 'naughty' not in line:
            newopen.write(line)

Method 3

I have used this to remove unwanted words from text files:

bad_words = ['abc', 'def', 'ghi', 'jkl']

with open('List of words.txt') as badfile, open('Clean list of words.txt', 'w') as cleanfile:
    for line in badfile:
        clean = True
        for word in bad_words:
            if word in line:
                clean = False
        if clean == True:
            cleanfile.write(line)

Or to do the same for all files in a directory:

import os

bad_words = ['abc', 'def', 'ghi', 'jkl']

for root, dirs, files in os.walk(".", topdown = True):
    for file in files:
        if '.txt' in file:
            with open(file) as filename, open('clean '+file, 'w') as cleanfile:
                for line in filename:
                    clean = True
                    for word in bad_words:
                        if word in line:
                            clean = False
                    if clean == True:
                        cleanfile.write(line)

I’m sure there must be a more elegant way to do it, but this did what I wanted it to.

Method 4

Today I needed to accomplish a similar task so I wrote up a gist to accomplish the task based on some research I did.
I hope that someone will find this useful!

import os

os.system('cls' if os.name == 'nt' else 'clear')

oldfile = raw_input('{*} Enter the file (with extension) you would like to strip domains from: ')
newfile = raw_input('{*} Enter the name of the file (with extension) you would like me to save: ')

emailDomains = ['windstream.net', 'mail.com', 'google.com', 'web.de', 'email', 'yandex.ru', 'ymail', 'mail.eu', 'mail.bg', 'comcast.net', 'yahoo', 'Yahoo', 'gmail', 'Gmail', 'GMAIL', 'hotmail', 'comcast', 'bellsouth.net', 'verizon.net', 'att.net', 'roadrunner.com', 'charter.net', 'mail.ru', '@live', 'icloud', '@aol', 'facebook', 'outlook', 'myspace', 'rocketmail']

print "n[*] This script will remove records that contain the following strings: nn", emailDomains

raw_input("n[!] Press any key to start...n")

linecounter = 0

with open(oldfile) as oFile, open(newfile, 'w') as nFile:
    for line in oFile:
        if not any(domain in line for domain in emailDomains):
            nFile.write(line)
            linecounter = linecounter + 1
            print '[*] - {%s} Writing verified record to %s ---{ %s' % (linecounter, newfile, line)

print '[*] === COMPLETE === [*]'
print '[*] %s was saved' % newfile
print '[*] There are %s records in your saved file.' % linecounter

Link to Gist: emailStripper.py

Best,
Az

Method 5

Use python-textops package :

from textops import *

'oldfile.txt' | cat() | grepv('bad') | tofile('newfile.txt')

Method 6

The else is only connected to the last if. You want elif:

if 'bad' in line:
    pass
elif 'naughty' in line:
    pass
else:
    newopen.write(line)

Also note that I removed the line substitution, as you don’t write those lines anyway.

Method 7

Try this works well.

import re

text = "this is bad!"
text = re.sub(r"(.*?)bad(.*?)$|n", "", text)
text = re.sub(r"(.*?)naughty(.*?)$|n", "", text)
print(text)

Method 8

Regex is a little quicker than the accepted answer (for my 23 MB test file) that I used. But there isn’t a lot in it.

import re

bad_words = ['bad', 'naughty']

regex = f"^.*(:{'|'.join(bad_words)}).*n"
subst = ""

with open('oldfile.txt') as oldfile:
    lines = oldfile.read()

result = re.sub(regex, subst, lines, re.MULTILINE) 

with open('newfile.txt', 'w') as newfile:
    newfile.write(result)

Method 9

to_skip = ("bad", "naughty")
out_handle = open("testout", "w")

with open("testin", "r") as handle:
    for line in handle:
        if set(line.split(" ")).intersection(to_skip):
            continue
        out_handle.write(line)
out_handle.close()

Method 10

bad_words = ['doc:', 'strickland:','n']

with open('linetest.txt') as oldfile, open('linetestnew.txt', 'w') as newfile:
    for line in oldfile:
        if not any(bad_word in line for bad_word in bad_words):
            newfile.write(line)

The n is a Unicode escape sequence for a newline.

All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0

0 0 votes

Article Rating