I want to show a simple, real world problem and solution I made today with Python.
Problem: My data has been corrupted (over 8,000 rows in Excel). Somewhere in my haste, I have scrambled first names and last names in the ‘first name’ and ‘last name’ field.
Solution: Find unique words from both fields, which leads to unique first and last names.
Copy the columns from Excel into a Notepad text file. Then run the below code on the text file in command prompt.
with open('file_name.txt',"r") as f: lines = f.readlines() lines = [line.split() for line in lines] new_line = list() edited_lines = list() for line in lines: # keep only the first appearance of a word on each line new_line = [word for word in words if word.strip() not in new_line] edited_lines.append(new_line) with open("edited_lines.txt",'w) as f: for line in edited_lines: line = ' '.join(line) +'\n' f.write(line)
As you can see to the right, this has done a good job of repairing the damage, but we still need to get separation of first and last names. Luckily, I had already written some very ugly code tthat I probably shoudn’t share (oh well!) that tries to match first and last names to columns appropriately and insert a semi-colon as a delimiter. Here is the code.
Note: You may need to use Excel’s “From Text” feature if pasting in data from text files does not work.