Suppose I have a list of URLs in a text file:
google.com/funny unix.stackexchange.com/questions isuckatunix.com/ireallydo
I want to delete everything that comes after ‘.com’.
Expected Results:
google.com unix.stackexchange.com isuckatunix.com
I tried
sed 's/.com*//' file.txt
but it deleted .com as well.
Answers:
Thank you for visiting the Q&A section on Magenaut. Please note that all the answers may not help you solve the issue immediately. So please treat them as advisements. If you found the post helpful (or not), leave a comment & I’ll get back to you as soon as possible.
Method 1
To explicitly delete everything that comes after “.com”, just tweak your existing sed solution to replace “.com(anything)” with “.com”:
sed 's/.com.*/.com/' file.txt
I tweaked your regex to escape the first period; otherwise it would have matched something like “thisiscommon.com/something”.
Note that you may want to further anchor the “.com” pattern with a trailing forward-slash so that you don’t accidentally trim something like “sub.com.domain.com/foo”:
sed 's/.com/.*/.com/' file.txt
Method 2
You can use awk‘s field separator (-F) following way:
$ cat file google.com/funny unix.stackexchange.com/questions isuckatunix.com/ireallydo
$ cat file | awk -F '\.com' '{print $1".com"}'
google.com
unix.stackexchange.com
isuckatunix.com
Explanation:
NAME
awk - pattern scanning and processing language
-F fs
--field-separator fs
Use fs for the input field separator (the value of the FS predefined variable).
As you want to delete every things after .com, -F '.com' separates line with .com and print $1 gives output only the part before .com. So, $1".com" adds .com and gives you expected output.
Method 3
The best tool for non-interactive in-place file editing is ex.
ex -sc '%s/(.com).*/1/ | x' file.txt
If you’ve used vi and if you’ve ever typed a command that begins with a colon : you’ve used an ex command. Of course many of the more advanced or “fancy” commands you can execute this way are Vim extensions (e.g. :bufdo) and are not defined in the POSIX specifications for ex, but those specifications allow for a truly astonishing degree of power and flexibility in non-visual text editing (whether interactive or automated).
The command above has several parts.
-s enables silent mode to prepare ex for batch use. (Suppress output messages et. al.)
-c specifies the command to execute once the file (file.txt, in this case) is opened in a buffer.
% is an address specifier equivalent to 1,$—it means that the following command is applied to all lines of the buffer.
s is the substitute command that you are likely familiar with already. It is commonly used in vi and has essentially identical features to the s command of sed, though some of the advanced regex features may vary by implementation. In this case from “.com” to the end of the line is replaced with just “.com”.
The vertical bar separates sequential commands to be executed. In many (most) ex implementations you can also use an additional -c option, like so:
ex -sc '%s/(.com).*/1/' -c x file.txt
However, this is not required by POSIX.
The x command exits, after writing any changes to the file. Unlike wq which means “write and quit”, x only writes to the file if the buffer has been edited. Thus if your file is unaltered, the timestamp will be preserved.
Method 4
Very quick, simple and dirty python way:
#!/usr/bin/env python
import sys
with open( sys.argv[1] ) as file:
for line in file:
print line.split("/")[0]
Sample run
<a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="1261797d7e7d766b73526770677c6667">[email protected]</a>:$ chmod +x removeStrings.py <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="b9cad2d6d5d6ddc0d8f9ccdbccd7cdcc">[email protected]</a>:$ ./removeStrings.py strings.txt google.com unix.stackexchange.com isuckatunix.com <a href="https://getridbug.com/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="7c0f1713101318051d3c091e09120809">[email protected]</a>:$ cat strings.txt google.com/funny unix.stackexchange.com/questions isuckatunix.com/ireallydo
All methods was sourced from stackoverflow.com or stackexchange.com, is licensed under cc by-sa 2.5, cc by-sa 3.0 and cc by-sa 4.0