[Vtigercrm-developers] Vlastic Search

Alan Bell alan.bell at libertus.co.uk
Tue Apr 26 08:20:22 GMT 2016


Elastic search (and sphinx search before it) is something we have been 
using since before 6.x when search kinda worked but was really slow at 
scale because it did a SQL LIKE "%value%" search on every field in every 
module, which is an unindexed brute force search, with hundreds of 
thousands of records that was a serious load causing massive CPU spikes 
on the server and slowing the system down for other users. Yes, 
searching was rather broken in 6.x (it is now fast, but doesn't search 
much) so we put a lot of work into turning it into a store-quality 
module that respects security settings and so on. We think that using an 
external specialised full text search tool would be of value at scale 
even if the mysql based searching was improved - I don't think that any 
query in mysql that starts with % can use an index so is inherently 
brute force searching. It might well be possible to do something in 
vtiger to nominate fields to be searched on (so people could search for 
phone numbers for example) and search more than it does at the moment 
but you are always trading performance off against the amount you search.

We should expand our test data, it looks more impressive at scale, 
basically it doesn't slow down - so you can see what the performance is 
like with just a few thousand records, it stays the same when it gets 
huge. When we were talking about times to index things that is the 
initial index loading process, what really matters is the search 
response time after it is indexed - and for that it is *fast* and 
additionally is not a heavy load on the mysql process (there is some 
load as we check permissions for each record using the Vtiger API but 
that is already heavily optimised)

There are some core improvements that absolutely belong in the main 
code, like the to_html performance, encrypted passwords for the portal 
etc. Integrating a dedicated external search server isn't quite like those.

Alan.


On 26/04/16 00:14, Błażej Pabiszczak wrote:
>
> I always have something to say, it has its pros and cons, won't be 
> different this time.
>
> I've been thinking for a couple of minutes why you created this 
> functionality. I figured out that the only reasons for doing so are:
> 1. Producer's reluctance to introduce changes, and therefore moving 
> away from the idea of improving the OS system, just like vtexperts you 
> go towards developing commercial modules
> 2. Strictly financial. Searching was destroyed in v6.x to such an 
> extent, that you can easily make money.
>
> If some basic functionalities like:
> 1 Searching
> 2. PDF printouts
> 3. Sending emails
> 4. etc
> Are going to be available only as paid solutions, then if you looked 
> back at it in some time, you will notice it would do more harm than 
> good for the project, because a tiny company that implements this 
> system, would have to purchase a few modules at the very beginning in 
> order to start working with it. If this is the direction Vtiger and 
> community are going [which is – free poor core and paid missing 
> functionalities] then let's be honest and stop calling it Open Source. 
> I don't mean your company, but all developer companies around vtiger, 
> that write modules. At this point you're already starting to move in 
> different directions. A few stores with modules were created instead 
> of just one. You offer the same functionalities that are compatible 
> neither business-wise, nor logic-wise. And publishing paid modules 
> that should be a part of the system by default makes the system lose 
> its value.
>
> Going back to vlastic – in my opinion using ElasticSearch is a bad 
> choice, because it doesn't solve the basic problem, which is poor 
> searching in the engine. Instead of fixing the problem by improving 
> the existing system, you added a separate tool, that is not an 
> integral part of the system. And that creates problems in several 
> layers. 'Search' is a FUNDAMENTAL mechanism and cannot be changed 
> without changing the main engine, and most of all – without 
> significant changes in query generator.
>
> In other topics you wrote about tests on databases that had 20k 
> records. You launched a system with 120k test data. Even though it's 
> not the best testing environment [data were generated in just one 
> module, instead in all of them] it's important to know that MySQL 
> itself can easily deal with larger databases that need no clustering. 
> It means that a few changes in Vtiger's core allow for much more than 
> this module currently offers, especially that improving the core gives 
> incomparable benefits.
>
> Simple Vtiger code optimization might give significantly better 
> results than integration with any external tool. ElasticSearch is used 
> for completely different purpose than what you used it for. I can give 
> you tons of examples where solutions directly in the core give more 
> benefits than your external tool, and what's most important- they're 
> way faster if they're optimized.
> Take a look at how we developed our default search engine, and why 
> this solution is much better for the system, than using ElasticSearch.
>
> ---
> Z poważaniem / Regards
> *Błażej Pabiszczak*
> /Chief Executive Officer/
> M: +48.884999123
> E: b.pabiszczak at yetiforce.com <mailto:b.pabiszczak at yetiforce.com>
> ------------------------------------------------------------------------
>
> YetiForce 3.0 LTS has arrived! Test 
> <https://gitdeveloper.yetiforce.com/> the latest, most innovative open 
> source system in the world, and join 
> <https://github.com/YetiForceCompany/YetiForceCRM>our community.
>
> W dniu 2016-04-21 11:18, Alan Lord napisał(a):
>
>> On 11/04/16 10:46, Alan Lord wrote:
>>> On 11/04/16 10:33, Sutharsan Jeganathan wrote:
>>
>>>> I think the demo crm should have large volume of data to preview the
>>>> effectiveness.i think this extension is for the CRM instances with
>>>> larger data volume, correct?
>>>
>>> That is a very good point. But we don't have any "safe" data. We have
>>> tested it on customer's systems with large volumes (> 1m records) but of
>>> course we cannot put that data on-line for all to see ;-)
>>>
>>> I will see if we can obtain some public data that we can use.
>>
>> I have updated our demo server and added and indexed around 121,000 
>> records. I have a lot more data I can add, but I don't have time to 
>> prepare the data for import yet.
>>
>> http://geotools.libertus.co.uk
>>
>> However, you should see that the search is still nice and fast with 
>> 120k records :-) It doesn't seem to make a lot of difference to 
>> Elasticsearch how many rows it has frankly.
>>
>> Hope this is helpful.
>>
>> A manual is available here: 
>> http://www.libertus.co.uk/images/documents/vlastic-manual.pdf
>>
>> Alan
>>
>>
>> _______________________________________________
>> http://www.vtiger.com/
>
>
> _______________________________________________
> http://www.vtiger.com/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.vtigercrm.com/pipermail/vtigercrm-developers/attachments/20160426/daccc996/attachment.html>


More information about the vtigercrm-developers mailing list