NetTools – What if – Performance issues

It is a little bit too late for questions like this, but this morning I woke up with it. “What about performance?”

“Imagine how a larger audience will have effect on a plugin like this. Each time you, and hundreds of other people browse themselves into websites and the extension should start analyze the website.”

Well. This is not a real problem as long as you store data “locally” (or via storage, to have the data shared between browsers). In that specific case, the storage will give the user immediately stored data for a given site. If we store data based on the current visited domain both time and network performance will be saved. It is when we’re trying to synchronize visits against APIs the real problems begin. Especially if each element contains different kinds of urls that has to be sent away for analysis. In cases like Facebook there may be a lot of data transferred out, as the extension won’t know what the page contains initially.

If we’d like to blacklist a linked url, we’re not only limited to Facebook. Blacklisted URLs must be reachable from each site that goes through analysis. It probably won’t get beautiful if the extension grows larger. One solution is to just send the domain name out (hashed?), but with a large amount of traffic this may, this could still be an issue.

The idea it self could look like this:

Fetch all elements when document loaded.

All hostnames will be hashed – so www.test.com/link/to/somewhere looks like this …

aac5c2fc2cd9b325c8361ff2da0b2c4864e0c948

… where only www.test.com is the part of the hash. Rendering all hrefs with domains only will push back the amount of data being sent. If many of the links on a website is pointing to the same place – or even better if all do – there will be only one hash to send initially.

Waiting for the document to be loaded comes with a price though: Facebook dynamically loads their pages, so when scrolling downwards, there will never be a finished document.

Fetch elements with DOMSubtreeModified.

The first versions of the chrome extensions handled this very well. Since the data was stored locally, elements fetched and analyzed was instant. There was only a few short words (nyatider, friatider, etc) to look for. But sending data out online in this state will also be instant; they wont render in bulk, so the datastream will take longer. With lots of usage this might of course be a problem. Having hosts hashed is a good way to do this, however we can’t avoid the fact with the datastream. The bulk will be smaller, but the data stream will still be there.

The next problem to handle here is location. Imagine usage from Japan, when/if all analyzing servers are located in a different part of the world. There will be delays. And still, downtime is not even considered here…

Posted in Uncategorized | Leave a comment

Network Tools Embryo v3 in the shape of 2.1.0

For a few hours I booted up something that is supposed to be the new version of the extremely old project “Content Filter for Facebook”. However, the name has been changed at least two times (the name I just mentioned was the first one, which almost got banned due to a trademark issue; I wasn’t allowed to use Facebook in the name). When it first got renamed after the first trademark issue, I gave it the name “NETFILTER”, since my plans was to make – specifically – an extension for Chrome that filtered content. The platform was Facebook.

Just very recently I realized that targeting only one platform was more about thinking inside the box rather than outside I changed my mind. I started a project (in JIRA) with the plan to make it bigger. The target was already more than just “stupid tests” those days. And for a few hours ago, one of the more important reasons for this project became fascism and fake news.

Lazy as I am, I kept this in my mind for a long time. “I have no time right now”. And I actually don’t. And most of the time I can’t motivate myself to go further. Today, I was trying to fool myself. Many of my projects are initiated with the question “Is it even possible to do this?“. Saying this and convincing myself that the project is “just an experiment” is often what initiates something bigger. And today, it happened. I created, actually due to the disovery of some quite scary fascist videos on Youtube, the first embryo for “Network Tools 3.0”. The current version is an alpha release, so I borrowed the old “Content filter”-source and named it 2.1.0.

So, how do it look? Well. It’s a quite simple base, since I want to start easy on this. The reason; there’s not much prepared yet. There is no scraping tools live, and netcurl is still in development. So the easiest way to boot this project is to make most of the variables local. Adding for example blacklisted sites to the extension will be done via Google-API’s. So there’s no sharing going on yet. This is also done for “other secure reasons”.

As you can see above, the embryo is a tiny overlay box in the upper left corner. It’s currently clickable. When clicked, a larger box will be opened. Actually, it’s the same overlay box that expands to a full screen. In this mode, which will be upgraded in short, we’re supposed to do something. In the example, the extension asks the user if he/she wants to blacklist the page. This was however only a test, to see how we later on should attack the problem. The purpose here, is to see how far we can go, compared to the first prior obsolete releases where it rather opens up a menu context when right clicking the mouse. Blacklisting elements that was was OK, but it won’t cover the new purpose as elements and clicking was quite limited. Instead I want to give me – and eventually others – to give more options.

If we for example don’t want to blacklist the host, we might want to flag the site as trusted, untrusted, etc. Maybe we don’t want to do this with the entire host. Maybe we just want to target a static uri within the site. The target may also be a specific URL withing the page in a specific element, and so on. This was nearly impossible to make nice with menu contexts as the options were too many.

So, this is basically the start. Nothing happens on the click, and the overlay should actually be completely invisible to not interfere with anything at the site. This extension should be completely silent until asked to work with something.

Source code base.

Posted in Uncategorized | Leave a comment

Testing netcurl and the future

Very few people are probably unaware of the test suites running for the current release of netcurl, via Atlassian Bamboo. Unfortunately Bamboo is already a openvz virtualization, which makes the tests a bit limited (I don’t think it is a good idea to build docker containers around openvz, even if I probably could). One of the reasons is that this kind of work may take more time to get a grip on, than it would to just rebuild netcurl with other rules and tests. So instead, there’s an ongoing preparation to compile PHP cores directly on the platform, separated by prefixes. By doing this, I hope that wider tests can be done without building power consuming containers. Instead, each test can execute itself with the correct version of PHP.

Test results for PHP 5.6 and PHP 7.3

Since this post was written, the tests has been rebuilt based on the binary collection of PHP versions. This is the first results from the tests, when running different PHP-versions from separate paths on the servers. The tests are green again!

The test suite for NETCURL is located at the URL below.

https://bamboo.tornevall.net/browse/LIB-NETCURL

Posted in Uncategorized | Leave a comment

NetCURL-PHP: The Final Communicator – The Upcoming clickbait hunter

This project can be found at https://netcurl.org

The beginning

In the beginning there was curl. And curl was the driver of everything. A sloppy written library was written to quickly fetch data from websites. Websites that normally published proxies. The proxies were stored, scanned and tested and those that still answered and actually forwarded traffic through them, was blacklisted. It supported, based on what curl did, socks-proxy scans.

The Ecommerce spheree era

A few year later or so, there was ecommerce. There was no library that could do “everything”. You either needed to configure everything yourself or more libraries since the APIs that was used had several entries where SOAP was one. The idea of implementing this library was born.

The cleanup

However, the code was still quite sloppy written so it got cleaned up and a project in bitbucket (link above) was created as an open source project. Suggestions of not using this wrapper was retrieved from different directions, and I explained why other wrappers was not a good idea. For example, GuzzleHttp was one of the examples. The problem with “the others” was that it had to get fully configured and manually set up before it could be used. Our need was different. We needed something that required only a few rows of lines to get started.

The expansion

NetCurl expanded to automatically detect available drivers. Curl was the primary. Soap was secondary. Guzzle gave me an idea to extend the support to catch up Guzzle and WordPress if they were available, as they – in difference to NetCurl – also supported streams which was the default communications engine for PHP. So detection of this was built in.

Today!

As of today, NetCurl has developed to be a quick-configurable library that calls for http-sites and parses them to usable objects or arrays. Netcurl activates whatever it need to fetch on high verbose level. It utilizes http codes to make throwable calls and extracts body data if necessary. However, the code was initially not written to be PSR compliant. The target of this code base right now, is to do so. One reason of this is that make the library less conflicting with PSR-compliant sites as the ecommerce base it is implemented in requries safer code. Also, the plans to build in more default communication engines (like before) so, regardless of what drivers a web service uses, communications should always be available and chosen by “best practice”.

Finally

The next and probably last step is to start implementing this in a professional API based that can – like FnargBlog’s “RSSWatch” did – fetch data automatically, store it and analyze it to be a part of the hunt of fake news, clickbait, changes in blogs, etc.

Posted in Uncategorized | Leave a comment

RSS Monitoring – ClickBaits and fascism

In the beginning of 2010, there was RSSWatch. A simple RSS feed monitor (tldr link here) was used to fetch, monitor sites and warn for changes. Primary target was WordPress sites – and partially Facebook. But since Facebook recently changed their API’s due to the Cambridge Analytica “incident” 2018 the ideas of “hatewatch” Facebook became useless. Besides, GDPR happened. The prior ideas crashed in data regulations and self-data-protection – at least at Facebook – could not longer “just get downloaded”. The projects backfired.

However, the E-Commerce projects – NetCURL – was practically reborn and could still be used as a RSS monitor as before. The current problem is that the API that was used also has to be rebuilt for this purpose. And to get it more effective. With all standards we meet, NetCURL also has to be more compliant to the purposes. For example, the goal with the NetCURL engine is to make it PSR-compliant so that extended projects can use it as wide as possible without complications.

A short description of NetCURL is “wrapper to web applications”, but unlike for example curl itself and applications like Guzzle, it configures itself with a proper driver that can utilize internet web content of different kinds. It’s a simplifier that shoud require minimum configuration for developers.

Primarily NetCURL 6.1 should support autoloading of drivers for CURL, streams, SOAP and the more common drivers that can download content from internet. With this finished, the prior project RSSWatch can be rebooted again. And with RSSWatch, we can also give life to our ClickBait-watcher: NETFILTER, that I hope can be controlled via web-browser extensions.

Questions have been asked before if clickbaits can be extended to “firewall out” fascism. It’s a quite interesting question, since the target with NETFILTER has been completely different. But since clickbaits is a very “diffuse” term to use, it will most probably support this and much other unwelcome content.

Posted in Uncategorized | Leave a comment

Rise and fall of RSSWatch – and the consequences

Apparently, time flies by very fast. Several years has gone since the first versions of RSSWatch saw the light. Under the FnargBlog RSSWatch “brand”, a very sloppy release of a RSS monitor was built – at that time the goal was simple: Monitor blogs and look for content updates. The reason was furhtermore simpler: It was an era where a bunch of young Swedish bloggers wrote provoking articles. When people commented, either the content or the comments was manipulated.

At the time, WordPress and other hosting providers didn’t do much to prevent spam robots entering the scene, so FnargBlog used the opportunity to publish updates to comment fields each time there was an edited or removed comment. But RSSWatch didn’t only do this. The primary monitoring was to watch for post edits as content could completely change so that comments did no longer match the content of the original post.

In short, the purpose of RSSWatch had great impact in the end. However, a failure of power around christmas 2015, practically killed the service completely. TornevallNET joined and took over some of the projects.

For a third time, lack of time hit RSSWatch hard. There was other work to do, and PHP was about to expire. Old versions of PHP (5.3, 5.4) began to get deprecated, so most of the scripts died by themselves. However, the web-engine fetcher that also scanned for open prixes, got new life by joining e-commerce business. From the prior versions to the new purpose, it started to support wider drivers to make sure that each server using it should have access to the internet, regardless och the platform (from very simple file_get_contents(), curl and SOAP support, I realized that the application never failed me).

And there we are now. Close to summer 2019, the plans are slowly raising. Tornevall restored the power of the forum platform, with the purpose to host and support those applications, where the docs and older wordpress-sites is not enough. In short I’m going to try to compile a list of what must be done before we can awake those old scanning services again. And what could come in handy at the same time…

Posted in Uncategorized | Leave a comment

Hello readers!

This blog has the purpose to make it possible to follow the development of Tornevall Networks. Unfortunately Tornevall Networks is a “one man army” and it may take time between updates and releases as there is a private life outside the Tornevallverse that has to be take care of too.

However, this is a try to stay updated with the realtity.

Posted in Uncategorized | Leave a comment