Fixing common errors in your novel with Find and Replace and Regex
21 Monday Oct 2013
Written by J. Abram Barneck in Writing Tips
Tags
No tags :(
Share it
Pretty much any author can use find and replace. But how many can really use it effectively, even to save money? I can. If you read this article, you can, too.
This article will include simple find and replace suggestions. But most authors have never even heard of Regular Expressions. Regular Expressions, often called Regex for short, are really a techies tool for finding a pattern of characters. But you can really fix a lot of issues and know they are 100% fixed without having to wait for an editor to catch them.
However, if you are using Microsoft Word, you may not even know you can use regular expressions, but they are there, though not exactly fully featured, in the Advanced Find when you click More. You have many search options and regular expressions in word will be different if you have Use Wildcards checked or not.
Sigil, an open source eBook authoring tool, has full support for regular expressions. However, because it is html, there is often html tags that you can use, such as </p> which ends a paragraph.
So if you fix as many errors as possible with Find and Replace and regex, you are making the editors job easier. The editor won’t have to spend time to fix them. If you pay your editor hourly, then they will spend less hours and you spend less money. Also, the editor can focus on content and real problems rather than stupid grammatical issues that a computer can automatically fix.
Here is a nice list of what to find that I have created. I will mark them as whether you can using Find and Replace or whether you have to do a manual replace.
Paragraphs that do NOT end with punctuation
(Sorry, this doesn’t fix missing punctuation between sentences in a paragraph.)
Regex: [A-Za-z]$ Sigil: [A-Za-z]</p> Word: ^$^p
Replace: Manual. Because you as the author must determine the punctuation and because some things that shouldn’t have punctuation, like chapter headings, might be returned in the search.
One or more needless spaces at the end of a paragraph
Regex: [ t]+$ Sigil: [ t]+</p> Word: ^w^p
Replace:
Regex: Yes. Leave it blank. Replacing with nothing solves this. Sigil: </p>; Replace with only the end tag. Word: ^w^p Replace with only the end paragraph symbol: ^p
One or more needless spaces at the start of a paragraph
Regex: ^\s+ Sigil: <p>\s+ Word: ^p^w
Replace:
Regex: Yes. Leave it blank. Replacing with nothing solves this. Sigil: <p> Replace with only the end tag. Word: ^p Replace with only the end paragraph symbol: ^p
All sentences that do not start with a capital letter
Regex: [A-Za-z][.?!]["”]*\s+[a-z] Sigil: [A-Za-z][.?!]["”]*\s+[a-z] Word: Check the "Use Wildcards" checkbox: [A-z][\!\.\?] [a-z]
Replace: Manual
You can probably replace with only ? and !
Important! There are too many instances of a period after acronyms and shortened titles, Mrs., Mr., Jr., etc., with periods for you to trust this to work without looking at each one.
Here is a couple of sentences that would be wrong if the next character after the punctuation were capitalized.
- Techroscopic Inc. was the name of his company.
- “Awesome!” he said.
All end quotes followed by a capital letter to see if should be a lowercase letter.
Example to be fixed: “Go away.” She said.
After it was fixed: “Go away,” she said.
Tip: This helps if the quotes are smart quotes. If you aren’t using smart quotes, you don’t know if a quote is opening or closing. In Word, find and replace a quote ” with itself ” and Word, with default settings, will change all your quotes to smart quotes.
Regex: [^,]["”] [A-Z] Sigil: [^,]["”] [A-Z] Word: Check the "Use Wildcards" checkbox: [A-z][\!,]["”] [A-Z]
Replace: Manual
You just have to took at each one.
All opening quotes followed by missing quotes or embedded quotes.
Example of a missing quote: “Hello, she said.
Example of embedded quotes (should be single quotes): “The word “death” means separation.”
Tip: This helps if the quotes are smart quotes. If you aren’t using smart quotes, you don’t know if a quote is opening or closing. In Word, just find and replace all ” with a ” and Word, with default settings, will change all your quotes to smart quotes.
Important! This is not always incorrect. When a speaker speaks for multiple paragraphs, a final quote is not used until the last paragraph.
Regex: “[^”]+(“|$) Sigil: “[^”]+(“|</p>) Word: Check the "Use Wildcards" checkbox: ["“][!"”]@^13
Replace: Manual. You have to verify that the dialog doesn’t cross paragraphs.
All closing quotes without opening quotes
Some examples of bad closing quotes.
Example 1 – Missing opening quote: “Hello,” she said. How have you been?”
Example 2: – Extra quote that should be there: “Hello,” she said.”
Regex: (^|”)[^“]+” Sigil: (<p>|”)[^“]+” Word: Check the "Use Wildcards" checkbox: [”^13][!“]@”
Replace – Manual
Any quotes that aren’t smart quotes
Regex: [“”’] Sigil: ^[^>]*<p[^>]*>[^>]*([“”’])[^>]* Word: "
Replace – Manual, except in Word. In word, replace all quotes and they will all be switched to smart quotes (unless you turned that setting off).
A Comma after a conjunction
This is a stylistic choice, as it isn’t exactly wrong, but I prefer never to have these.
Regex: b(but|and|so|which|yet|or|except), Sigil: b(but|and|so|which|yet|or|except), Word: ”Repace – Manual. Either move the comma to before the conjunction or delete it.
Find character names you write inconsistently
Find character names you write inconsistently. I use three letters. The first letter, a consistent middle consonant, and the last letter. If your misspellings change the first or last letter, you’ll need to figure out your own.
Examples:
- Aiden
- Adien
- Aeden
- Adan
- Adin
- Adn
Regex: A\w*d\w*n Sigil: A\w*d\w*n Word: Check the "Use Wildcards" checkbox: A([aei]@)[d]([aei]@)n
Replace: Aiden (or your correct spelling)
Example 2
- Neihan
- Neihan
- Niehan
- Nihan
- Nehan
- Neihen
- Niehen
- Nihen
- Nehen
Regex: N[ie]{1,2}h[ae]n Sigil: N[ie]{1,2}h[ae]n Word: Check the "Use Wildcards" checkbox: N([ie]@)h[ae]n
Expect me to continue to update this article. The next step is to create a tool that will find these sentences for me (or you if I give you the tool). I already have a tool, it is just time to use it.
10 Comments
CM said:
January 23, 2014 at 12:46 pm
Looking forward to the help with punctuation at the end of sentences. The help you have given so far has proven to be extremely useful in both Word and Sigil. Excellent work. Hope the end of sentence punctuation help comes soon.
J. Abram barneck said:
January 23, 2014 at 2:14 pm
What exact help are you looking for? What is the punctuation problem you want to solve?
Are you looking to fix missing punctuation at the end of a Sentence? Then you would do this:
Sigil: [^.!?] [A-Z]
Word: [!.!?] [A-Z]
“This is an example of a missing punctuation at the end of the sentence There should have been a punctuation.”
“This is an example of a sentence that will result in a false positive. I love Michelle and Aiden and Lincoln.”
You will have so many false positives that this search becomes useless. However, you can add a bunch of “exceptions” which are actually negative look-aheads in Regex terms.
Sigil: [^.!?] (?!Michelle|Aiden|Lincoln)[A-Z]
Word: Unknown
Notice that you can add as many exceptions as you want to regex. Just add a pipe character “|” and then the proper name. You may have 100 proper names that you have to add. You may also want to add Mr. and Mrs. and Dr. and such.
However, while this will help alleviate false positives, it will actually make you miss possible problematic sentences like this one:
“I should end this sentence with a punctuation Michelle will be mad.”
If your goal is perfection, then this problem cannot be solved with find and replace or Regex. If your goal is just to fix as many issues as you can before sending to the editor so your editor can spend less time on stupid grammar issues and more time helping your novel’s content be the best it can be, then this is still valid.
CM said:
January 25, 2014 at 12:27 pm
Yes that is what I was meaning. Issues such as “He walked down the street to deliver the paper After he was done he noticed it was starting to drizzle.” See how the punctuation (period) would be missing between “paper” and “After” where one sentence ends and another one goes forward when it should be “deliver the paper. After he was done”.
Jackie said:
February 17, 2014 at 3:23 pm
I love you. 😉 I was having a serious issue with missing quotation marks, and this is a 600-page book, so you have saved me an astounding amount of time. I learned that regular expressions could help me with this problem, so I was starting from the very beginning and trying to school myself in regex 101. I am no programmer, but I think I have potential, as I’m quite the researcher and actually enjoyed learning about regex. But there’s a lot to learn, and your expressions have fast tracked this process for me. THANK YOU SO MUCH!!
rl said:
November 24, 2014 at 1:15 pm
hello, I’m looking to fix three errors in an epub conversion using regex, and I’m having a little difficulty.
The first is to eliminate the / that appears in random words. Example, th/ere i/s sl/ashes in these sen/tenc/es
The second is to eliminate any question marks that are at the end of each paragraph. The paragraph is two sentences. This is the last one?
And the third is to change any capital O’s to zeros.
Any help you could offer me would be great
J. Abram Barneck said:
November 24, 2014 at 5:13 pm
For the slash, does you novel have any slashes that you should keep? My guess is no. In this case, you don’t need regex, just replace all / with nothing. Even if you do have slashes to keep, find them in your original source. There will likely only be a few and re-add them. If you have a lot of slashes in your original source, then regex is not your tool. Regex can tell the different between correct/right slash and b/a/d or wro/ng slash. The only way to the tell the difference is that one slash is surrounded by words, wheras one is not. You would need a Natural Language parser for that. Which require coding.
J. Abram Barneck said:
November 24, 2014 at 5:19 pm
Well, if you are in sigil, your paragraph will be html tagged using the p tag:
You want to find this:
Gregg Bell said:
August 8, 2015 at 1:37 pm
Thanks so much for this J. Abram! Have been looking for something like this for a long time. Really appreciate you sharing it!
Jörg Malek said:
April 2, 2018 at 1:05 pm
You are the man!
The issue: Multiple dialogs spanning across paragraphs in an epub with over 1000 pages *sigh*.
I was googling for hours, tearing my hairs off, bouncing my head against the wall and so on – and then your hint (#6):
Sigil: “[^”]+(“|)
This saved me – thank you so much for sharing 🙂
J. Abram Barneck said:
April 2, 2018 at 1:39 pm
Glad it helped you!