Synonym list sorting job challenge got me stumped
nambu
![]()
- Joined: Jan 7, 2010
- Posts: 21
This one has got me stumped. I'm sure there must be a way to do this in WinAutomation, but I'm having trouble working it out. Any help is greatly appreciated!
I have a synonym list containing 27,462 lines of synonyms/phrases separated in pipe dilimited format like this:
big|large|huge
a big cheese|a bigwig
a big shot|somebody
giant|enormous|huge|gargantuan
a big shot|someone
sway|plead with|argue|win over|make somebody believe you
sway|win over
There are a lot of repetitions in the list. So I want to break down the number of lines, by finding any synonym which appears more than once in the list. Then I want to join it's line with the line where it also appears above it.
For example, the word "huge" appears twice in the example list above. I would like the job to remove line #4, and join it with line #1 (after removing the duplicate word "huge"). Same with the other duplicates. So end result would be this:
big|large|huge|giant|enormous|gargantuan
a big cheese|a bigwig
a big shot|somebody|someone
sway|plead with|argue|win over|make somebody believe you
The problem I'm having is, if I do a For Each loop, or a Loop based on ItemsCount, it gets thrown off each time an item is removed from the list. I've tried to get around this by subtracting the LoopIndex by 1 each time I remove a list item, but get out of index range errors.
Any ideas on how to accomplish a job like this? I attached the synonym file here just in case.
kimc
![]()
- Joined: Jan 13, 2011
- Posts: 298
Nambu -
Welcome to the forum. This is a really cool project by the way. My first guess on how to approach it since you have already done some work is to make/create a new list that, start populating it with candidate items, if the item is duplicated, ignore it but start a separate loop to gather those items to add it to the main list. I know this is kind of generic. Is the first word in the pipe delimited list always the main or parent word and any other occurrence is a synonym? I'm curious on how the parent or root word is determined.
nambu
![]()
- Joined: Jan 7, 2010
- Posts: 21
Hi kim c,
Thanks for your reply. I had tried making a new list before starting this thread, but was going about it the wrong way. Reading your post made me give it another try, and I found what I was doing wrong. The thing that messed me up was checking for duplicates in the original list instead of the new list. When I checked the new list instead, I got everything working.
Another problem I had was I was parsing each synonym for a match. The problem was that the word |big| would also turn up as a duplicate in |a big shot|. Anyway, I made a job which uses the "Subtract Lists" action to remove any exact duplicates and add the rest to where the duplicate was found.
The job is a little crude, and maybe someone can suggest a way to streamline what I'm doing. Took about 23 hours for the whole list to be sorted, but it got done. Probably would've taken me 23 weeks to do it all by hand without my trusty friend WinAutomation. :)
I'll post the job I made here, in case anyone's interested in something like this.
Also, to answer your question...the synonym file is for a plugin called CyberSEO which I'm using with Wordpress. The plugin swaps out synonyms found in blog post text that match those in the synonym file at random.
giant|enormous|huge|gargantuan
man|guy|dude
So, the text "What a huge man."
Might be:
What a giant man.
What a huge dude.
What a gargantuan guy.
There is no main or parent word in the above example, but you can make one by adding a carrot to the first word...for example:
>man|guy|dude
In this case, only if the word "man" appears in the body text will it be "synonymized"
It's a pretty cool plugin for anyone using Wordpress, and can do a lot more than swapping synonyms.