The Forum has been Updated
        The code has been upgraded to the latest .NET core version. Please check instructions in the Community Announcements about migrating your account.
    
                        Say you have a block of text and you need to identify some phrases that may or may not be in the block of text, what is the most efficient way of doing it. Phrases are given priority based on the length of words.
DB structure
The phrase Lorem ipsum dolor sit amet would take preference over Lorem ipsum dolor and Lorem
You could iterate through all of the phrases (starting with the longest) and use inStr to test whether it is there or not. For the purposes of this example the detected phrase is removed i.e. if Lorem ipsum dolor is detected you don't have to worry about Lorem being detected in the same place later on but other Lorem phrases would be if they exist. There would be problems with this method if there are hundreds of thousands of phrases.
At the moment I am breaking the block of text up into words and store them individually in an array. I then use a LIKE query to identify the phrases in the database beginning with the first word in the array. If rows are returned I then check to see if the next word is the second word of any of the rows and if it is I remove it from its position in the array and add it to the preceeding array element and start the process again (recursive function). I don't have this working fully yet but it seems like a very inefficient way of doing things.
asp, vb dotnet, c#, java, pseudo code or any other suggestions welcome.
                Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer sit amet lacus. Fusce erat. Proin vel arcu quis justo viverra fermentum. Praesent ante mi, pretium vitae, dapibus ut, laoreet vitae, lacus. Aenean non diam. Sed non ipsum. In hac habitasse platea dictumst. Donec tincidunt mollis dui. Praesent magna mauris, elementum sed, cursus non, sodales quis, massa. Vestibulum mattis volutpat leo. Proin ornare ipsum ac justo. Quisque accumsan.
DB structure
Code:
phrase					word_count
Lorem ipsum dolor sit amet		5
Lorem ipsum dolor				3
sit amet					2
Lorem						1The phrase Lorem ipsum dolor sit amet would take preference over Lorem ipsum dolor and Lorem
You could iterate through all of the phrases (starting with the longest) and use inStr to test whether it is there or not. For the purposes of this example the detected phrase is removed i.e. if Lorem ipsum dolor is detected you don't have to worry about Lorem being detected in the same place later on but other Lorem phrases would be if they exist. There would be problems with this method if there are hundreds of thousands of phrases.
At the moment I am breaking the block of text up into words and store them individually in an array. I then use a LIKE query to identify the phrases in the database beginning with the first word in the array. If rows are returned I then check to see if the next word is the second word of any of the rows and if it is I remove it from its position in the array and add it to the preceeding array element and start the process again (recursive function). I don't have this working fully yet but it seems like a very inefficient way of doing things.
asp, vb dotnet, c#, java, pseudo code or any other suggestions welcome.