Changelog
Version 2.4 - Spring 2016 edition
===========
Main highlights of the release:
Conditional dataflow on the pipeline structure level. For every dataflow rule you can set up conditions
that will be computed based on the parameters of the context.
Multiple conditions can be grouped with an optional common ELSE branch where the dataflow will happen by default.
INPUT_PLUS is a lightweight mechanism that allows a parent job to selectively pass its parameters to its children
without the need to specify which parameters are being passed. It's a significant simplification in comparison
with what could be achieved with templates, although templates will keep their niche for renaming and evaluating params.
New style URL parser that understands shorter URLs like "?table_name=foo", "?accu_name=bar&accu_address=[]" for referring to local objects.
It also allows to refer to the absoulte/relative SQLite filepath in full. Some compatibility sacrifices had to be made,
but in version/2.4 the old parsing way has priority over the new one, with a warning to encourage switching to the new format.
NB: see these three features in action in the Long-multiplication pipelines
All the example pipelines have been grouped together under a new directory: Bio/EnsEMBL/Hive/Examples
New configuration mechanism to 'tweak' parameters and attributes of pipelines either during pipeline initialization or afterwards.
For tweaking things during initialization we have extended init_pipeline.pl to understand 'tweak' commands -SET , -SHOW and -DELETE.
For tweaking things after the pipeline database has been created there is a new tweak_pipeline.pl script that understands the same 'tweaks' :
-SET 'pipeline.param[take_time]=20' # override a value of a pipeline-wide parameter; can also create an inexistent parameter
-SET 'pipeline.hive_pipeline_name=new_name' # override a value of a hive_meta attribute
-SET 'analysis[take_b_apart].param[base]=10' # override a value of an analysis-wide parameter; can also create an inexistent parameter
-SET 'analysis[add_together].analysis_capacity=3' # override a value of an analysis attribute
-SET 'analysis[blast%].batch_size=15' # override a value of an analysis_stats attribute for all analyses matching a pattern
-SET 'analysis[part_multiply].resource_class=urgent' # set the resource class of an analysis (whether a resource class with this name existed or not)
-SET 'resource_class[urgent].LSF=-q yesteryear' # update or create a new resource description
In both contexts you can print out the current value of things:
-SHOW 'pipeline.hive_pipeline_name' # show the pipeline_name
-SHOW 'pipeline.param[take_time]' # show the value of a pipeline-wide parameter
-SHOW 'analysis[add_together].analysis_capacity' # show the value of an analysis attribute
-SHOW 'analysis[add_together].param[foo]' # show the value of an analysis parameter
-SHOW 'resource_class[urgent].LSF' # show the description of a particular meadow of a resource_class
Either pipeline-wide or analysis-wide parameters can also be deleted:
-DELETE 'pipeline.param[foo]' # delete a pipeline-wide parameter
-DELETE 'analysis[add_together].param[bar]' # delete an analysis-wide parameter
In addition to the simple attributes analyses also have two "complex" ones: wait_for and flow_into.
They can either be set from scratch:
-SET 'analysis[add_together].wait_for=["analysisX","analysisY"]' # remove all old wait_for rules, establish new ones
-SET 'analysis[part_multiply].flow_into={1=>"?table_name=intermediate_result"}' # remove all old flow_into rules, establish new ones
or you can append new ones to the existing pile of rules:
-SET 'analysis[add_together].wait_for+=["analysisZ","analysisW"]' # append two new wait_for rules
-SET 'analysis[part_multiply].flow_into+={1=>"another_sink"}' # append a new flow_into rule
You can only delete the whole set, not individually:
-DELETE 'analysis[add_together].wait_for' # delete all wait_for rules of an analysis
-DELETE 'analysis[part_multiply].flow_into' # delete all flow_into rules of an analysis
You can also check their current content:
-SHOW 'analysis[add_together].wait_for' # shows the list of wait_for rules of an analysis
-SHOW 'analysis[part_multiply].flow_into' # shows the list of flow_into rules of an analysis
The 'tweak' mechanism does not require that you prepare the PipeConfig files with $self->o() references, which significantly simplifies PipeConfigs.
Universal Runnables:
* JobFactory: non-contiguous split option has been added for those who have to use minibatching
* FastaFactory has been improved: more input file-formats -which can be compressed-, target output directory
* SqlCmd supports transactions
* new run_system_command() method available to all Runnables (defined in Process). It takes care of disconnecting from the eHive database and can capture stderr
* "Bash pipefail" mode is used to catch errors on both sides of pipes in many system() calls
Developer tools:
* Registry names can generally be used to refer to databases (go_figure_dbc())
* The parameter substitution behaviour when some components are unavailable has been standardised, param_exists() has been fixed
* An extra post_healthcheck() API method has been added to Runnables (and the POST_HEALTHCHECK status to Jobs) to stop failures in their tracks
* We reenabled cross-database dataflow and control rules and added a special Client/Server version of LongMult pipeline.
* The diagram display code can now display the newly added conditions (with a length limit) and cross-database dataflow or control rules (parts of "foreign" pipelines are shown on different colour background).
* An experimental "Unicode-art" flow diagram drawing code has been implemented (skip the -output parameter in generate_graph.pl to see)
* eHive's DBAdaptor now has methods to get the list of eHive tables and views
* standaloneJob test method: warnings can be assessed via a regular expression
* Support for Slack WebHook integrations in beekeeper and a dedicated Runnable
Under the hood:
* HivePipeline object with its collections becomes the center of things, and TheApiary becomes the centralized way of accessing foreign objects
* A lot of work has been done on improving the test suite to run faster and cover more modules
* A failed prepare() shows a full stack trace on error
* Speed improvement of storing extended job parameters via adding an MD5 checksum based index
* The parsers of both 'bjobs' and 'bacct' have been extended to also support the output format of LSF v.9.1.2.0
* + numerous bug fixes, many of which have been ported to the previous version branches.
Example pipelines and runnables:
* A new example pipeline that calculates %GC for a collection of sequences has been created. It is configured using the 'GCPct_conf' PipeConfig.
* Example Runnables and PipeConfigs are now grouped together under Bio/EnsEMBL/Hive/Examples.
** DbCmd contains the TableDumperZipper_conf PipeConfig, which illustrates usage of the DbCmd Runnable
** FailureTest contains the FailureTest_conf and MemlimitTest_conf PipeConfigs, along with the FailureTest runnable, which illustrate eHive error handling
** GC contains the GCPct_conf PipeConfig and two new Runnables, CalcOverallPercentage and CountATGC, which together form a simple example pipeline illustrating the eHive fan and accumulator features.
** Factories contains four PipeConfigs illustrating the use of a 'factory' runnable to create fans of jobs. CompressFiles_conf, RunListOfCommandsOnFarm_conf, and ApplyToDatabases_conf use the JobFactory runnable to create the fan, whilst FastaFactory_conf illustrates the use of the more specialised FastaFactory runnable.
** LongMult contains the long multiplication example pipeline. There are several PipeConfigs that implement this pipeline using different eHive features, such as the parameter stack, the new Input_Plus mechanism, and client-server interactions.
** SystemCmd contains AnyCommand_conf, a very simple PipeConfig that runs a single command using SystemCmd.
Version 2.3 - Spring 2015 edition
===========
Main highlights of the release:
API for Runnables written in "guest languages" (with reference Python implementation and examples)
Test suite (inspired by Roy's original pull request)
"TailTrimmer" [ in analyses with nontrivial batch sizes ] several techniques are now used to automatically decrease the batch size
towards the end of the analysis in order to speed up the execution of the whole analysis
Stability improvements that significantly increase efficiency of parallel execution
[ higher level features ]
* support for Runnables written in Python3 and API for extending similar support to other languages (this API may still change)
* coloured Beekeeper output - catches the eye!
* SystemCmd now runs through Capture::Tiny , captures the error output from the actual command that gets stored in log_message
* SystemCmd also knows how to capture MEMLIMIT events from the underlying Java code
* SystemCmd can map specific return codes to dataflow events
* a new DbCmd runnable that mimics the behaviour of db_cmd.pl script ; you can also pipe data in or out of the connection to another system command
* DbCmd, DatabaseDumper and MySQLTransfer runnable hide passwords in the command lines that they run
* beekeeper.pl -unkwn option to clean up the workers found to be in UNKWN state (at the user's risk!)
[ lower level features ]
* record the 'meadow_user' in each Worker entry -- these values are also used when querying the Meadow to avoid running an equivalent of "-u all" in SGE Meadow
* record the 'when_seen' timestamp in each Worker entry -- when the Worker was last seen as running by the Beekeeper process.
* testing: introduced a Travis-integrated test suite loosely based on Roy's original pull request
-- the extended version tests direct API calls, runs individual Runnables (and tests their dataflow/warning events) or whole pipelines
* testing: Travis runs tests against Hive databases stored in local MySQL, PostgreSQL and SQLite databases
* stability [too many simultaneous queries] : detect and log deadlock collisions and retry them for a given number of times before failing
* stability [running out of server connections] : try to resolve the "too many connections" situation by bouncing, waiting and retrying
* stability [running out of local ports] : avoiding RELOCATED workers by applying incemental backoff-and-retry approach from Ethernet CSMA/CD protocol
* stability [applying an incorrect patch] : schema patches now have internal SQL-based checks and should not cause much damage if applied in wrong order
+ a new script to create such patches
* the schema version changes to 73
* multiple bug fixes, many of which have been ported to the previous version branches.
Version 2.2 - 'analyses_pattern'
===========
* Running and maintenance of pipeline subsets has been made easy with -analyses_pattern option in beekeeper.pl
that understands ranges and additive/subtractive merging. You can refer to analyses in many different ways.
Examples:
-analyses_pattern 1..9 # show scheduling for a range of analysis_ids
-analyses_pattern 1..9,11..15 -run # run a scheduling iteration for two ranges of analysis_ids
-analyses_pattern fasta% -sync # sync analyses matching a pattern
-analyses_pattern 1..9-5-report -loop # loop over a range except two analyses
-analyses_pattern 1..9,fasta% -reset_all_jobs # reset all jobs belonging to a range and a pattern
-analyses_pattern foo,bar,baz -reset_failed_jobs # reset failed jobs belonging to three analyses by names
* The same option is available in runWorker.pl to constrain the set of analyses to specialize into (fully works with -can_respecialize 1 mode)
* Detailed log of Scheduler's decision-making process is available
* db_cmd.pl and SystemCmd.pm runnable have been reworked and are now better adapted for quoted arguments
* Doxygen API documentation packaged with the code
* Scripts' man pages converted into HTML and packaged with the code
* New docs about installing eHive, running eHive and running MPI jobs with eHive
* Using rawgit to render HTML docs hosted on GitHub (impossible otherwise)
* No schema changes since version/2.1 : the same database should continue to work with newer code without patching
Version 2.1 - 'multirole'
===========
* Improved internal API that allows implicit lazy-loading of objects associated with other objects via their dbIDs
* Objects that make up pipeline's graph can be loaded into cache, which simplifies structural topup of existing pipeline databases
* Diagram-drawing engine was stripped of its' dependence on dbIDs, so diagrams can now be built directly from PipeConfig file(s) using -pipeconfig option(s)
* -analysis_topup removed (became the default mode of operation), -job_topup removed in favour of seed_pipeline.pl providing same functionality
* pipeline_wide_parameters moved into a separate table, so hive-specific 'meta' table is no longer needed, and Ensembl's version can happily coexist
* 'monitor' table removed in favour of offline generate_timeline.pl script (that does not require a constantly running beekeeper.pl for data generation)
* pipeline_create_commands() is executed even on topup; redefine to return an empty list or use -hive_no_init if you don't need commands to be executed
* Switched to 'worker_resource_usage' table, unified resource collection calls for other Meadows, so SGE/CONDOR/etc resources can be shown in guiHive & timeline.
* Introduced 'role' table and Role objects to better track role-switching of multirole Workers
* Added Process::complete_early() as the blessed way to exit the code early successfully and store a log_message
* More careful semaphore rebalancing strategy that can also be switched on or off during pipeline database generation
* Logging and error reporting has been improved and simplified
* Multiple bugs have been fixed
Version 2.0 - a major 'coreless' release of Hive code
===========
* Removed dependencies from EnsEMBL core code. You don't need to install Ensembl core to run non-Ensembl pipelines.
* Moved Ensembl-specific configuration to EnsemblGeneric_conf, from which all Ensembl pipelines should now inherit.
Version 1.9 - largely a maintenance release + preparations for separation from Ensembl core
===========
* Various preparations to make the code more GitHub-friendly
* A better class hierarchy with less depenencies from Ensembl core code
* At last we have a proper code version test: "use Bio::EnsEMBL::Hive::Version 1.9;" works, but "use Bio::EnsEMBL::Hive::Version 2.0" currently fails.
* "beekeeper --version", "runWorker.pl --version" and "db_cmd.pl --version" report both code version and Hive database schema version
* Multiple bug fixes
---------------------[before EnsEMBL rel.75]---------------------------------------------------------------------------
* Wed Dec 11 12:55:58 2013 +0000 | Leo Gordon | updated schema diagram (PNG) and description (HTML)
* Mon Dec 9 14:19:48 2013 +0000 | Leo Gordon | bugfix: sqlite mode now works again
* Mon Dec 9 14:01:27 2013 +0000 | Leo Gordon | added Apache 2.0 license to all files
* Wed Dec 4 11:26:09 2013 +0000 | Leo Gordon | schema_change: switched some foreign keys to ON DELETE CASCADE (thanks, Harpreet!)
* Wed Dec 4 11:04:14 2013 +0000 | Matthieu Muffato | Updated the list of dependencies
* Wed Dec 4 10:53:17 2013 +0000 | Matthieu Muffato | Added info on how to run lsf_report.pl and generate_timeline.pl
* Wed Dec 4 10:31:51 2013 +0000 | Matthieu Muffato | Removed the option to use a logscale axis, and added a grid in the background
* Mon Dec 2 18:13:29 2013 +0000 | Matthieu Muffato | Another set of rounding errors
* Mon Dec 2 18:01:59 2013 +0000 | Matthieu Muffato | Added a mode to plot the number of pending workers for each analysis
* Mon Dec 2 18:01:30 2013 +0000 | Matthieu Muffato | Neater way to add/substract a worker
* Mon Dec 2 18:00:02 2013 +0000 | Matthieu Muffato | Added a mode to plot the amount of unused CPU cores each analysis
* Mon Dec 2 17:55:13 2013 +0000 | Matthieu Muffato | "Unused memory" instead of "Wasted memory"
* Mon Dec 2 17:01:52 2013 +0000 | Matthieu Muffato | Pulls in the time information (pending time, cpu usage, lifespan)
* Mon Dec 2 09:56:07 2013 +0000 | Matthieu Muffato | Improved the documentation
* Mon Dec 2 09:53:53 2013 +0000 | Matthieu Muffato | Added "verbose" mode
* Mon Dec 2 08:35:35 2013 +0000 | Matthieu Muffato | Not valid any more when counting the wasted memory (rounding errors)
* Sun Dec 1 23:20:35 2013 +0000 | Matthieu Muffato | Added a mode to plot the amount of wasted memory by each analysis
* Sun Dec 1 23:11:11 2013 +0000 | Matthieu Muffato | Also store the meadow_name in lsf_report
* Sun Dec 1 23:10:21 2013 +0000 | Matthieu Muffato | The unit conversion table is constant
* Sun Dec 1 23:03:56 2013 +0000 | Matthieu Muffato | dbname may be undefined
* Sun Dec 1 22:17:37 2013 +0000 | Matthieu Muffato | Added a mode to plot the number of CPU cores used by each analysis
* Sun Dec 1 22:13:24 2013 +0000 | Matthieu Muffato | Added a mode to plot the RAM used by each analysis
* Tue Dec 3 12:19:20 2013 +0000 | Leo Gordon | create a separate directory layer to group log files of the same iteration
* Tue Dec 3 11:56:07 2013 +0000 | Leo Gordon | separate output files by LSF_job_id and LSF_jobarray_index
* Tue Nov 26 11:08:31 2013 +0000 | Leo Gordon | simplify logging of submission output/error streams
* Wed Nov 27 12:19:20 2013 +0000 | Matthieu Muffato | s/profile/timeline/g
* Wed Nov 27 11:46:31 2013 +0000 | Matthieu Muffato | Gets the birth/death events instead of sampling the database. The "NOTHING" curve is not needed any more
* Thu Nov 14 01:01:33 2013 +0000 | Matthieu Muffato | Reads the data from the database once at the beginning, and process it offline
* Wed Sep 11 00:17:51 2013 +0100 | Matthieu Muffato | "DarkSlateGray" looks better for the "NOTHING" curve
* Wed Sep 11 00:09:00 2013 +0100 | Matthieu Muffato | Added documentation
* Wed Sep 11 00:08:42 2013 +0100 | Matthieu Muffato | GNUplot is now controlled via Chart::Gnuplot
* Tue Sep 10 14:31:15 2013 +0100 | Matthieu Muffato | Improved the GNU-plot output
* Tue Sep 10 01:20:28 2013 +0100 | Matthieu Muffato | Only gnuplot has to know about the filtered analysis, the CSV file should still contain all the data
* Tue Sep 10 00:51:55 2013 +0100 | Matthieu Muffato | First version of a script to generate the analysis profile of a pipeline
* Mon Nov 25 16:57:37 2013 +0000 | Leo Gordon | schema_change: detect and register RELOCATED events that used to mess up things on LSF 9.0 ("job rescheduled" in LSF parlance)
* Mon Nov 25 16:54:11 2013 +0000 | Leo Gordon | be more careful with fetch_overdue_workers (Use 5sec threshold to avoid checking recently active Workers. Do not use it at all when performing -all_dead.)
* Mon Nov 25 16:47:35 2013 +0000 | Leo Gordon | bugfix: last_check_in should only be updated by register_worker_death if the Worker is burying itself
* Mon Nov 25 16:35:16 2013 +0000 | Leo Gordon | cosmetic: added (commented out) warning messages for every external system() call that LSF module runs - simplifies debugging a lot
* Mon Nov 25 14:08:52 2013 +0000 | Leo Gordon | ranked claiming technology: added support for both sqlite and pgsql drivers
* Thu Nov 21 15:40:31 2013 +0000 | Leo Gordon | The last resort: try claiming without an offset (risking a collision)
* Tue Nov 19 11:17:38 2013 +0000 | Leo Gordon | use OFFSET to separate jobs being claimed into ranges
* Mon Nov 18 14:55:11 2013 +0000 | Leo Gordon | No need to left join into worker table - thanks, Javier!
* Tue Nov 12 16:42:32 2013 +0000 | Leo Gordon | ResourceDescription expanded to include both submission_cmd_args and worker_cmd_args. Both args can be specified in a PipeConfig file.
* Tue Nov 12 11:15:56 2013 +0000 | Leo Gordon | increase TotalRunningWorkersMax to 2000
* Mon Nov 11 14:32:04 2013 +0000 | Leo Gordon | added an example of how to turn a csv into a list by param_substitute
* Wed Nov 6 11:13:35 2013 +0000 | Leo Gordon | introducing db_cmd() interface method that takes care of the path to db_cmd.pl
* Tue Nov 5 09:33:37 2013 +0000 | Matthieu Muffato | bugfix: the batch_size parameter should have a hyphen in front of it
* Fri Oct 25 15:28:42 2013 +0100 | Leo Gordon | (1) do not change SEMAPHORED jobs to READY and (2) support more flexibility in choosing which statuses to reset
* Fri Oct 25 11:35:57 2013 +0100 | Leo Gordon | schema change: turned all VARCHAR(<255) into VARCHAR(255) -- should improve experience with long host namest (thanks, MichaelP!)
* Fri Oct 25 10:24:45 2013 +0100 | Leo Gordon | param_required() now automatically sets transient_error(0) before dying, to avoid unnecessary retries (thanks, Matthieu!)
* Thu Oct 24 15:37:36 2013 +0100 | Matthieu Muffato | "expected_size" has to be substituted as well
* Tue Oct 15 11:21:16 2013 +0100 | Matthieu Muffato | bugfix: the query has to be re-substituted for each job
* Tue Oct 8 10:58:22 2013 +0100 | Matthieu Muffato | The SqlHealthcheck runnable can now perform multiple tests
* Fri Sep 27 18:16:11 2013 +0100 | Matthieu Muffato | -reg_conf and -reg_type can be ommitted in db_cmd.pl
---------------------[after Sept'2013 workshops]------------------------------------------------------------------------
* Tue Oct 1 16:30:14 2013 +0100 | Leo Gordon | newer Perl required, BioPerl no longer required, seed_pipeline.pl mentioned
* Tue Oct 1 13:03:21 2013 +0100 | Leo Gordon | pipeline_name is now automatically computed from ClassName; simplified workshop's example files and slides
* Fri Sep 27 15:21:04 2013 +0100 | Leo Gordon | added param_exists() method for checking whether a parameter has been initialized at all
* Thu Sep 26 23:57:55 2013 +0100 | Leo Gordon | cleaned up the last (optional) slide on pipeline_wide_parameters; removed the exercise about abstracting out the compressor (formerly from CompressFiles_conf)
* Thu Sep 26 23:54:55 2013 +0100 | Leo Gordon | separated out "long addition" functionality to concentrate on Hive API when writing the Runnable, and not on maths
* Thu Sep 26 10:53:44 2013 +0100 | Leo Gordon | bugfix: ENSCOMPARASW-131. Swapped two rearrange() calls for slicing a hashref
* Wed Sep 25 16:42:47 2013 +0100 | Leo Gordon | bugfix: ENSCOMPARASW-132. When all dependent jobs (>1) fail to be created due to unique constraint, they now correctly update status to READY
* Wed Sep 25 15:43:58 2013 +0100 | Leo Gordon | bugfix: make sure the pipeline works even when b_multiplier only contains digits 0 and 1
* Wed Sep 25 15:03:09 2013 +0100 | Leo Gordon | bugfix: properly support evaluation of complex substituted expressions that yield a hashref
---------------------[before Sanger workshop]---------------------------------------------------------------------------
* Mon Sep 23 12:29:44 2013 +0100 | Leo Gordon | added "git clone" option
* Mon Sep 23 12:22:07 2013 +0100 | Leo Gordon | some corrections to slides part2
* Sun Sep 22 20:18:42 2013 +0100 | Leo Gordon | part3 of the slides and the solutions (first version)
* Sat Sep 21 22:31:29 2013 +0100 | Leo Gordon | updated slides for parts 1 and 2 and solutions2.tar
* Thu Sep 19 11:25:37 2013 +0100 | Leo Gordon | Sanger version of the first part (re-made in LibreOffice)
* Mon Sep 16 09:30:15 2013 +0100 | Leo Gordon | bugfix: should not assume the presence of JobAdaptor in dataflow
* Fri Sep 13 16:28:13 2013 +0100 | Leo Gordon | alternative substitution syntax #expr( #alpha#*#beta# )expr# and a test script
* Fri Sep 13 11:17:45 2013 +0100 | Leo Gordon | cleanup: two templates that are no longer necessary
* Wed Sep 11 16:45:53 2013 +0100 | Leo Gordon | new colourscheme has arrived!
* Tue Sep 10 16:43:29 2013 +0100 | Leo Gordon | typo bugfix: jobs-->job in SQL
* Tue Sep 10 15:46:40 2013 +0100 | Leo Gordon | bugfix: reset the tried jobs to retry_count=1 and untried ones to retry_count=0 when doing a bulk reset
* Mon Sep 9 13:11:10 2013 +0100 | Leo Gordon | changes made before the talk
* Sun Sep 8 22:58:11 2013 +0100 | Leo Gordon | Preliminary version of slides for the second part of the workshop.
* Sun Sep 8 19:20:02 2013 +0100 | Leo Gordon | bugfix: we should allow any characters apart from { and } in the key
* Sun Sep 8 14:37:43 2013 +0100 | Leo Gordon | cosmetic: a hint for people working on the example
* Sat Sep 7 14:25:36 2013 +0100 | Leo Gordon | added support for EHIVE_HOST and EHIVE_PORT envariables; useful for the workshop environment
* Sat Sep 7 12:35:11 2013 +0100 | Leo Gordon | imported List::Util to be able to run max/min/sum of lists in substituted expressions
* Sat Sep 7 11:26:18 2013 +0100 | Leo Gordon | bugfix: now correctly supports directory names with dots in them
---------------------[before EBI workshop]---------------------------------------------------------------------------
* Thu Sep 5 16:55:44 2013 +0100 | Leo Gordon | PDF version of the workshop slides from GoogleDocs
* Thu Sep 5 09:37:00 2013 +0100 | Leo Gordon | adding new unit - T for terabytes (mainly to pacify EBIs LSF 8 with a reporting bug)
* Wed Sep 4 21:54:43 2013 +0100 | Leo Gordon | the initial state of MemlimitTest pipeline for the workshop
* Wed Sep 4 13:06:46 2013 +0100 | Leo Gordon | methods dbconn_2_mysql(), dbconn_2_pgsql(), db_connect_command(), db_execute_command() are DEPRECATED - use db_cmd.pl instead
* Wed Sep 4 12:49:04 2013 +0100 | Leo Gordon | added support for -pipeline_url as an input parameter (no need to supply hive_driver or password in this case)
* Wed Sep 4 11:53:23 2013 +0100 | Leo Gordon | allow the port number to be skipped but the colon to be present
* Fri Aug 30 15:09:05 2013 +0100 | Leo Gordon | a new example pipeline designed to fail because of MEMLIMIT in some of the cases
* Tue Aug 27 12:09:20 2013 +0100 | Leo Gordon | bugfix: properly use different memory units to compute the memory req in megabytes
* Fri Aug 23 14:40:51 2013 +0100 | Leo Gordon | now performing deep-stack substitution for whatever is dataflown into tables (rather than just dataflowing the output_id)
* Fri Aug 23 12:48:20 2013 +0100 | Leo Gordon | bugfix: substituting the accu signature on demand from the very depths of emitting job's param_stack
* Fri Aug 23 10:46:46 2013 +0100 | Leo Gordon | bugfix: make sure longer input_id hashes are correctly fetched from analysis_data table in "param stack" mode
* Thu Aug 22 15:49:32 2013 +0100 | Leo Gordon | Simplified interface: now db_cmd.pl understands 'CREATE DATABASE' and 'DROP DATABASE' without parameters, given a full URL.
* Thu Aug 22 14:40:11 2013 +0100 | Leo Gordon | Can now do a mysqldump given a URL or Registry data, using a newly supported -to_params option. Note the necessity of 'eval' before 'mysqldump' (it removes quotes around the password).
* Thu Aug 22 11:13:09 2013 +0100 | Leo Gordon | bugfix: sorting by job_id should be numeric, not alphabetic
* Wed Aug 21 16:13:26 2013 +0100 | Leo Gordon | renamed db_conn.pl to db_cmd.pl to avoid the name clash with already existing term
* Wed Aug 21 15:55:46 2013 +0100 | Leo Gordon | Updated schema diagram and description file that include param_id_stack and accu_id_stack in job table.
* Wed Aug 21 14:53:11 2013 +0100 | Leo Gordon | "parameter stack" implementation using two extra fields in job table. Accu content intended for any job_id has preference over Input_id content for the same job.
* Wed Aug 21 11:34:01 2013 +0100 | Leo Gordon | store and retrieve hive_meta.'hive_use_param_stack'
* Wed Aug 21 11:31:20 2013 +0100 | Leo Gordon | cosmetic: reduce the number of synonymous calls to DBI
* Wed Aug 21 10:14:00 2013 +0100 | Leo Gordon | Dataflowing minimal information out of Runnables, relying on templates in PipeConfig file to extend it if needed
* Tue Aug 20 14:32:51 2013 +0100 | Leo Gordon | shortened connection parameters in docs
---------------------[after EnsEMBL rel.73]---------------------------------------------------------------------------
* Thu Aug 15 16:18:49 2013 +0100 | Leo Gordon | Bugfixes to pacify pgsql: changed a non-functional "HAVING" into a nested SELECT, and changed unsupported SUM() into COUNT(CASE ... )
* Thu Aug 15 16:15:28 2013 +0100 | Leo Gordon | An important comment about UNIX sockets (without a port number) vs TCPIP sockets (with a port number).
* Thu Aug 15 14:30:40 2013 +0100 | Leo Gordon | Expose parts of pipeline_db, make them less EnsEMBL-specific, allow multiple failover initializers and use self-reference if none of them worked. Phasing out $self->o('ENV', ...) expressions
* Thu Aug 15 14:27:43 2013 +0100 | Leo Gordon | Allow skipping the port number; you no longer need to define your port if you are happy with driver's default (thanks to db_conn.pl and core's DBConnection)
* Wed Aug 14 18:44:38 2013 +0100 | Leo Gordon | Registry support is now cenralised in DBAdaptor, so scripts just pass reg_* options into the constructor. Passing -reg_type allows to connect to originally non-Hive Registry entries.
* Wed Aug 14 12:58:04 2013 +0100 | Leo Gordon | make sure diagrams are generated from non-Hive registry entries as long as they are Hive-hybrids
* Wed Aug 14 10:44:29 2013 +0100 | Leo Gordon | Support extra parameters added to the client's command line
* Tue Aug 13 17:13:07 2013 +0100 | Leo Gordon | Start using the new db_conn.pl script instead of building driver-specific commands and running them.
* Tue Aug 13 17:10:45 2013 +0100 | Leo Gordon | Execute individual SQL commands as well as sessions; translate some db-meta SQLite into Bash; control verbosity
* Tue Aug 13 15:18:28 2013 +0100 | Leo Gordon | Schema change: changed the data type of monitor.analysis to TEXT as per Michael Paulini's suggestion, to fit more and longer analysis names.
* Tue Aug 13 15:14:01 2013 +0100 | Leo Gordon | Make this patch less mysql-dependent. Needs testing with PostgreSQL.
* Tue Aug 13 15:12:04 2013 +0100 | Leo Gordon | Allow multiple driver-dependent versions of the same patch; suggest schema patching with db_conn.pl commands.
* Fri Aug 9 15:46:37 2013 +0100 | Leo Gordon | concession for Bio::EnsEMBL::DBSQL::DBConnection that does not support urls
* Fri Aug 9 15:20:49 2013 +0100 | Leo Gordon | A unified dispatching client for databases. Finds the correct database client via -url or -reg_conf/-reg_alias combination.
* Fri Aug 9 15:11:09 2013 +0100 | Leo Gordon | Give a more meaningful warning if EHIVE_ROOT_DIR is not set (probably because an external script is trying to run Hive API)
* Sun Jul 28 20:47:52 2013 +0100 | Leo Gordon | bugfix: count both DONE and PASSED_ON jobs when re-balancing semaphores
* Thu Jul 11 11:30:27 2013 +0100 | Leo Gordon | included a new -nosqlvc flag in beekeeper.pl and runWorker.pl to overcome the version restriction in non-critical cases
* Thu Jul 11 11:28:58 2013 +0100 | Leo Gordon | bugfix: propagate no_sql_schema_version_check parameter through the URLFactory/DBAdaptor loop (should be re-factored at some point)
* Wed Jul 10 16:18:37 2013 +0100 | Leo Gordon | cleaned up the pipeline_create_commands a bit
* Tue Jul 9 17:15:32 2013 +0100 | Leo Gordon | the actual schema change (log_message.worker_id DEFAULT NULL)
* Tue Jul 9 17:03:08 2013 +0100 | Leo Gordon | Log all instances when a semaphore had to be re-balanced
* Tue Jul 9 17:02:04 2013 +0100 | Leo Gordon | schema change: allow recording of log_messages with worker_id=NULL
* Tue Jul 9 16:15:19 2013 +0100 | Leo Gordon | changed the interface of balance_semaphores() : pass in $filter_analysis_id instead of $filter_analysis
* Tue Jul 9 15:59:59 2013 +0100 | Leo Gordon | support selective balancing of semaphores funneling into a specific analysis
* Tue Jul 9 15:44:45 2013 +0100 | Leo Gordon | automate the re-balancing of semaphore_counts - do it when there is nothing running
* Tue Jul 9 15:38:47 2013 +0100 | Leo Gordon | introduced a new -balance option for beekeeper.pl so that semaphore_counts could be force-balanced
* Mon Jul 8 15:48:38 2013 +0100 | Leo Gordon | bugfix: back to using CONCAT -- it looks like || operator is non-standard in MySQL
* Tue Jul 2 16:17:01 2013 +0100 | Leo Gordon | start using procedures.pgsql with two main views ("progress" and "msg")
* Tue Jul 2 16:16:00 2013 +0100 | Leo Gordon | start showing resource_class in "progress" view + some SQL unification
* Tue Jul 2 13:15:37 2013 +0100 | Leo Gordon | bugfix: produce more specific bug report (either cannot connect or hive_meta unavailable)
* Tue Jul 2 12:52:30 2013 +0100 | Leo Gordon | separated the task of URL parsing out of the dba caching mechanism (needs more work)
* Mon Jul 1 12:10:44 2013 +0100 | Leo Gordon | bugfix: make sure we are getting the actual meta_value for hive_use_triggers
* Fri Jun 28 16:53:58 2013 +0100 | Leo Gordon | added 'hive_meta' to the list of tables being dumped
* Fri Jun 28 16:35:59 2013 +0100 | Leo Gordon | docs: documented the -input_id command line option
* Fri Jun 28 16:32:24 2013 +0100 | Leo Gordon | optimization: no point in catching and re-throwing my own throw!
* Fri Jun 28 16:27:09 2013 +0100 | Leo Gordon | bugfix: do not attempt to show AnalysisStats in case of an unspecialized Worker
* Fri Jun 28 11:40:31 2013 +0100 | Leo Gordon | bugfix: substituted the hard-coded value for the formula
* Thu Jun 27 16:17:48 2013 +0100 | Leo Gordon | tell the user whether to update the code to match the database SQL schema version, or which SQL patches to apply to the database
* Thu Jun 27 09:24:33 2013 +0100 | Leo Gordon | start checking Hive SQL schema version (code version against db version) and die on mismatch
* Thu Jun 27 09:19:42 2013 +0100 | Leo Gordon | bugfix: make sure we are only getting one value, not the rowhash
* Wed Jun 26 17:35:03 2013 +0100 | Leo Gordon | use SqlSchemaAdaptor to detect the current code's sql version and record it in 'hive_meta' (leave it out of tables.*sql* files)
* Wed Jun 26 17:32:22 2013 +0100 | Leo Gordon | A new "adaptor" for detection of software's sql version based on the number of available sql patches.
* Tue Jun 25 10:35:25 2013 +0100 | Leo Gordon | move Core 'schema_version' out of tables.* files into HiveGeneric_conf (via ApiVersion), expose it for manipulation and make it available to PipeConfigs
* Tue Jun 25 17:08:48 2013 +0100 | Leo Gordon | re-based MetaContainer (now it has two parents, NakedTableAdaptor is first); using the new version
* Tue Jun 25 17:04:04 2013 +0100 | Leo Gordon | new method(s) to remove objects/rows by a given condition
* Tue Jun 25 10:26:25 2013 +0100 | Leo Gordon | new 'hive_meta' table to keep hive_sql_schema_version (=number of patches), hive_pipeline_name and hive_use_triggers
* Wed Jun 26 16:55:34 2013 +0100 | Leo Gordon | Changed an 'our' global variable to ENV{EHIVE_ROOT_DIR} to allow API-only users to set it and work as usual
* Tue Jun 25 15:35:16 2013 +0100 | Miguel Pignatelli | added -hive_force_init option to documentation
* Tue Jun 25 11:11:45 2013 +0100 | Leo Gordon | bugfix: make sure users' tweaking of Data::Dumper::Maxdepth does not mess up stringify()' s operation
* Mon Jun 24 11:27:33 2013 +0100 | Leo Gordon | cosmetic: moving the sorting of keys into an external subroutine (it will be extended later)
* Mon Jun 24 11:07:25 2013 +0100 | Leo Gordon | Utils/Config.pm no longer depends on ENSEMBL_CVS_ROOT_DIR, which becomes non-essential for non-EnsEMBL applications.
* Fri Jun 21 15:54:28 2013 +0100 | Leo Gordon | bugfix: make sure fetch_all() works with empty tables
* Tue Jun 18 20:11:19 2013 +0100 | Leo Gordon | avoid deadlocks when dataflowing under transactional mode (used in Ortheus Runnable for example)
* Tue Jun 18 18:38:26 2013 +0100 | Leo Gordon | print the failed query
---------------------[after EnsEMBL rel.72]---------------------------------------------------------------------------
* Fri Jun 14 15:17:45 2013 +0100 | Leo Gordon | PostgreSQL: connection parameters are now supplied on the command line (no need to set PG variables by hand)
* Thu Jun 13 16:48:01 2013 +0100 | Leo Gordon | given -job_id Scheduler should take the Analysis into account and only submit a Worker for this Analysis
* Thu Jun 13 16:08:12 2013 +0100 | Leo Gordon | renamed some old patch files so that they would all conform to the same naming format
* Thu Jun 13 16:02:23 2013 +0100 | Leo Gordon | Adding foreign keys to PostgreSQL schema by reusing the MySQL file (the syntax happens to be exactly the same!)
* Thu Jun 13 15:50:38 2013 +0100 | Leo Gordon | Rename tables.sql to tables.mysql (less confusion)
* Thu Jun 13 15:47:15 2013 +0100 | Leo Gordon | allow the accumulated values to be longer than 255 characters
* Thu Jun 13 15:34:40 2013 +0100 | Leo Gordon | synchronized all 3 schema files
* Wed Jun 12 12:21:00 2013 +0100 | Leo Gordon | First attempt to support PostgreSQL in eHive. Use with caution.
* Mon Jun 10 17:00:31 2013 +0100 | Leo Gordon | experimental support for undef values in default_options
* Mon Jun 10 11:25:36 2013 +0100 | Leo Gordon | make sure both DatabaseDumper.pm and drop_hive_tables() know about the 'accu' table
* Mon Jun 10 09:54:38 2013 +0100 | Leo Gordon | report job_id of a created job (STDOUT) or warn that it had been created before (STDERR)
* Thu Jun 6 17:18:11 2013 +0100 | Leo Gordon | sqlite mode now also supports "-hive_force_init 1" flag
* Thu Jun 6 11:50:40 2013 +0100 | Leo Gordon | bugfix: correct destringification of a single undef on a line
* Wed Jun 5 17:11:18 2013 +0100 | Leo Gordon | Slow the example down a bit and allow 2 Workers. In "-can_specialize 1" mode the two Workers will complete the whole pipeline.
* Wed Jun 5 17:08:33 2013 +0100 | Leo Gordon | Improved output to distinguish multiple Workers' output in the same stream
* Wed Jun 5 11:31:17 2013 +0100 | Leo Gordon | setting "-hive_force_init 1" will cause init_pipeline.pl to drop the database prior to creation (use with care!)
* Tue Jun 4 17:03:05 2013 +0100 | Leo Gordon | added support for stringification/destringification of accumulated values (an element is allowed to be a complex structure)
* Mon Jun 3 22:28:28 2013 +0100 | Leo Gordon | now supports sleeping for a floating point seconds; take_time can be given by a runtime-computed formula such as "1+rand(1)/1000"
* Mon Jun 3 14:12:27 2013 +0100 | Leo Gordon | stop complaining about undefined take_time parameter (set it to 0 by default)
* Mon Jun 3 14:05:53 2013 +0100 | Leo Gordon | Added optional sleeping functionality to Dummy runnable
* Mon Jun 3 11:46:27 2013 +0100 | Leo Gordon | a presentation introducing accumulated dataflow concept
* Sat Jun 1 21:31:34 2013 +0100 | Leo Gordon | added description attribute to Limiter class
* Thu May 30 16:01:33 2013 +0100 | Leo Gordon | bugfix: both queries modifying semaphore_count are wrapped in protected_prepare_execute
* Wed May 29 16:13:09 2013 +0100 | Leo Gordon | bugfix: allow #expr(...)expr# to be properly overriding in the templates as well
* Tue May 28 16:29:23 2013 +0100 | Leo Gordon | Simplified logic to decide whether Scheduler needs a resync. Temporarily ignore limiters and look at the number of workers initially required.
* Tue May 28 15:13:56 2013 +0100 | Leo Gordon | finally implemented LSF's version of count_running_workers() and a Valley aggregator for all visible meadows
* Tue May 28 13:09:39 2013 +0100 | Leo Gordon | bugfix: make sure specializing workers wait while their analysis is being sync'ed
* Tue May 28 12:34:51 2013 +0100 | Leo Gordon | No need to pass $total_workers_to_submit back to beekeeper anymore.
* Thu May 23 10:10:14 2013 +0100 | Leo Gordon | cosmetic: make it explicit that we are importing rearrange() and throw()
* Wed May 22 12:43:00 2013 +0100 | Leo Gordon | removed dependency on check_ref and assert_ref
* Wed May 22 11:13:38 2013 +0100 | Leo Gordon | Hive is no longer directly dependent on BioPerl
* Thu May 16 16:37:49 2013 +0100 | Leo Gordon | All Hive scripts now detect $::hive_root_dir and use it for setting the @INC so manual setting of PERL5LIB is only needed if using API directly
* Tue May 14 16:55:38 2013 +0100 | Leo Gordon | make sure beekeeper.pl runs runWorker.pl from its own scripts directory (ignore the one in the path)
* Tue May 14 16:14:47 2013 +0100 | Leo Gordon | allow the user to choose a particular hive_root_dir (esp. if there are many)
* Thu May 9 13:55:40 2013 +0100 | Leo Gordon | Copied the @-tag annotation from tables.sql to tables.sqlite. Unlike the original mysql version, the SQLite version gives no warnings when processed by sql2html.pl
* Fri May 3 14:46:03 2013 +0100 | Leo Gordon | bugfix: some farms have non-alphanumeric characters in their cluster name
* Wed May 1 11:48:23 2013 +0100 | Leo Gordon | added a new protected_prepare_execute() method to avoid deadlocks and used it twice in AnalysisJobAdaptor, to fix Stephen's deadlocks
* Wed May 1 11:46:48 2013 +0100 | Leo Gordon | moved Hive's extensions to DBConnection into a separate Hive::DBSQL::DBConnection class
* Wed May 1 12:03:10 2013 +0100 | Leo Gordon | added a patch to add 'accu' table to an existing database & fixed sqlite schema
* Tue Apr 30 13:12:33 2013 +0100 | Leo Gordon | updated schema documentation to reflect addition of 'accu' table
* Tue Apr 30 12:48:09 2013 +0100 | Leo Gordon | added support to generate_graph.pl to show accumulated dataflow on the diagram
* Tue Apr 30 11:38:44 2013 +0100 | Leo Gordon | bugfix: do not crash on encountering accumulated dataflow (just ignore it for the moment); work correctly in DisplayStretched mode
* Mon Apr 29 17:12:17 2013 +0100 | Leo Gordon | Modified the LongMult example to use accumulated dataflow
* Mon Apr 29 17:07:56 2013 +0100 | Leo Gordon | added schema & API support for accumulated dataflow
* Tue Apr 23 15:35:35 2013 +0100 | Leo Gordon | changed schema version to 72
---------------------[before EnsEMBL rel.72]---------------------------------------------------------------------------
* Tue Apr 23 14:50:55 2013 +0100 | Leo Gordon | bugfix: only create 'default' resource_class if it was not actually stored in the database
* Tue Apr 23 13:08:44 2013 +0100 | Leo Gordon | bugfix: check before storing rc (may be necessary in -analysis_topup mode) and warn about consequences of redefining it.
* Tue Apr 23 13:05:37 2013 +0100 | Leo Gordon | API extension: store() now also returns how many actual store operations (as opposed to fetching of already stored ones) it has performed
* Fri Apr 12 16:43:19 2013 +0100 | Leo Gordon | tables.sql was made compatible with Core/Production sql2html.pl and the result is kept in docs/
* Mon Apr 8 12:20:29 2013 +0100 | Miguel Pignatelli [prf1] | Runtime is recorded for failing jobs
* Wed Mar 27 12:16:35 2013 +0000 | Javier Herrero | Added 22 Feb 2013 eHive workshop slides and examples to docs/presentation/
* Tue Mar 26 15:40:19 2013 +0000 | Leo Gordon | Make sure we do not create an analysis with non-hash parameters
* Mon Mar 25 11:05:00 2013 +0000 | Leo Gordon | use param_required() calls wherever a parameter value is required
* Fri Mar 22 16:50:42 2013 +0000 | Leo Gordon | Back to num_required_workers' meaning "how many extra workers we need to add to this analysis"; fixing a scheduling bug/oversensitivity to manual change of batch_size
* Fri Mar 22 15:44:55 2013 +0000 | Leo Gordon | Moved runnable checks into a separate method Analysis::get_compiled_module_name()
* Wed Mar 20 22:44:04 2013 +0000 | Leo Gordon | Do not crash when asked to param_substitute a Regexp, but issue a warning
* Wed Mar 20 13:02:12 2013 +0000 | Leo Gordon | free 'Start' from dealing with 'a_multiplier' by using an input_id_template in PipeConfig instead; renamed 'Start' to 'DigitFactory' to reflect that
* Wed Mar 20 10:35:08 2013 +0000 | Leo Gordon | A new and friendlier README file; defines main concepts and provides contact data
* Thu Mar 14 09:15:53 2013 +0000 | Leo Gordon | bugfix: added missing quotes
* Tue Mar 12 21:45:23 2013 +0000 | Leo Gordon | A 3-analysis pipeline with almost exclusive use of #substitution#; mysql_conn() and mysql_dbname() modified to transform urls as well
* Tue Mar 12 12:06:37 2013 +0000 | Leo Gordon | Improved legend with useful commands
* Tue Mar 12 10:56:55 2013 +0000 | Leo Gordon | The smallest Hive pipeline example possible. Just one SystemCmd-based analysis.
* Mon Mar 11 23:59:20 2013 +0000 | Leo Gordon | A cleaner example of a two-analysis pipelines with better demonstration of #substitution# and only implicit $self->o() references
* Mon Mar 11 21:13:58 2013 +0000 | Leo Gordon | moved 'go_figure_dbc()' into Utils; supplied defaults for MySQLTransfer to make it quiet
* Tue Mar 12 21:04:14 2013 +0000 | emepyc | This file is now JSON strict
* Tue Mar 12 13:50:33 2013 +0000 | Matthieu Muffato | Do not buffer the resultset (only tested with MySQL)
* Tue Mar 12 11:07:23 2013 +0000 | Matthieu Muffato | bugfix: <= instead of <
* Fri Mar 8 18:41:39 2013 +0000 | Matthieu Muffato | In "topup" mode, concurrent inserts make the row count unreliable
* Tue Mar 5 17:05:21 2013 +0000 | Leo Gordon | Protect generate_graph.pl in table-drawing mode from printing too many rows (by setting a limit in JSON config)
* Tue Mar 5 13:12:32 2013 +0000 | Leo Gordon | Protect generate_graph.pl in job-drawing mode from printing too many jobs (by setting a limit in JSON config)
* Tue Mar 5 13:10:38 2013 +0000 | Leo Gordon | extend a method in JobAdaptor to return a limited number of jobs (for use in generate_graph)
* Fri Mar 1 11:53:39 2013 +0000 | Matthieu Muffato | Fixed a memory leak in data_dbc()
* Thu Feb 28 15:41:46 2013 +0000 | Leo Gordon | cosmetic: renamed README.txt back to README to retain an unbroken history in CVS
* Thu Feb 28 15:37:42 2013 +0000 | Leo Gordon | cosmetic: added new commits to README and renamed it Changelog; split out the old README.txt (non-Changelog part)
---------------------[before and during EnsEMBL rel.71]----------------------------------------------------------------
* Thu Feb 28 10:12:41 2013 +0000 | Leo Gordon | avoid having beekeeper run in submitted-to-the-farm state - detect it, report and quit
* Thu Feb 28 09:47:40 2013 +0000 | Leo Gordon | param_substitution is now default everywhere, no need to call it explicitly
* Thu Feb 28 09:42:33 2013 +0000 | Leo Gordon | added param_required() and param_is_defined() interfaces to Process
* Wed Feb 27 21:34:47 2013 +0000 | Leo Gordon | bugfix: updated examples of how to use JobFactory without and with input_id_template
* Wed Feb 27 19:08:40 2013 +0000 | Leo Gordon | bugfix: changed implementation of data_dbc() to correctly compare things before caching
* Wed Feb 27 14:00:42 2013 +0000 | Leo Gordon | Clone::clone is no longer used, so dependency has been removed
* Fri Feb 22 16:55:12 2013 +0000 | Matthieu Muffato | It is more efficient to give MySQL a LIMIT clause
* Sat Feb 23 00:52:57 2013 +0000 | Leo Gordon | JobFactory uses $overriding_hash to create jobs/rows from input_id_template; 'input_id' parameter deprecated; standaloneJob supports templates.
* Sat Feb 23 00:49:15 2013 +0000 | Leo Gordon | Substitution machinery now supports an extra $overriding_hash that contains parameters with higher precedence than the whole of param() structure
* Fri Feb 22 16:36:19 2013 +0000 | Leo Gordon | fixed several problems with parameter substitution and detection of undefs; added param_required() and param_is_defined()
* Fri Feb 22 10:42:51 2013 +0000 | Leo Gordon | reload the cached data_dbc() value on change of param('db_conn')
* Thu Feb 21 16:14:35 2013 +0000 | emepyc | The modules of the analyses must be accessible
* Fri Feb 15 17:05:20 2013 +0000 | Matthieu Muffato | New runnable to check the size of the resultset of any SQL query
* Tue Feb 19 17:18:06 2013 +0000 | Leo Gordon | removed param_substitute() call from Runnables -- no longer needed, as substitution is automatic
* Tue Feb 19 16:46:05 2013 +0000 | Leo Gordon | a "total" (anything-to-anything) substitution mechanism has been implemented in Hive::Params
* Fri Feb 15 17:04:36 2013 +0000 | Matthieu Muffato | The preferred meadow type must be registered
* Fri Feb 15 17:03:24 2013 +0000 | Matthieu Muffato | The modules of the analysis must be loadable
* Fri Feb 15 17:01:27 2013 +0000 | Matthieu Muffato | In dataflow rules within the same database, the destination analysis must exist
* Fri Feb 15 17:00:38 2013 +0000 | Matthieu Muffato | In control rules within the same database, the condition analysis must exist
* Fri Feb 15 22:25:39 2013 +0000 | Leo Gordon | make sure all LSF pids are quoted, to protect them from tcsh interpretation of square brackets
* Thu Feb 14 16:41:49 2013 +0000 | Leo Gordon | a new script to remove old "DONE" jobs and associated job_file and log_message entries
* Thu Feb 14 10:45:26 2013 +0000 | Leo Gordon | seed_pipeline.pl now shows examples of input_ids of seedable analyses
* Thu Feb 14 09:54:00 2013 +0000 | Leo Gordon | Made $final_clause a parameter of _generic_fetch() & removed default ORDER-BY; hopefully faster
* Wed Feb 13 17:20:14 2013 +0000 | Leo Gordon | In case no -logic_name/-analysis_id was supplied, show the list of analyses that have no incoming dataflow (and so are candidates for seeding)
* Wed Feb 13 13:52:56 2013 +0000 | Leo Gordon | Added perldoc to seed_pipeline.pl script
* Wed Feb 13 13:35:55 2013 +0000 | Leo Gordon | A new script to quickly seed any analysis of any pipeline.
* Wed Feb 13 10:19:08 2013 +0000 | Leo Gordon | hide the calls to URLFactory into the DBAdaptor's constructor
* Tue Feb 12 10:22:02 2013 +0000 | Leo Gordon | hash of resources no longer depends on default_meadow (bugfix)
* Thu Feb 7 11:42:11 2013 +0000 | Kathryn Beal | Updated to release 71
* Wed Feb 6 17:43:21 2013 +0000 | Matthieu Muffato | Tables must be in the right order. Otherwise, the foreign key checks complain
* Fri Jan 25 19:42:28 2013 +0000 | Leo Gordon | resolving conflict: using mine
|\
| * Tue Jan 15 11:03:26 2013 +0000 | Matthieu Muffato | Table dataflows are now included into semaphore boxes (bugfix: wrong internal name)
* | Fri Jan 25 19:26:36 2013 +0000 | Leo Gordon | diagram improvement: (1) no more "empty boxes" and (2) tables dataflown from a box are shown in their boxes
|/
* Mon Jan 14 13:23:52 2013 +0000 | Leo Gordon | Added a new presentation, moved presentations into a separate folder.
* Fri Jan 11 11:19:11 2013 +0000 | Leo Gordon | cosmetic fix: commented back the debug output that was left uncommented by mistake
* Fri Jan 11 11:07:47 2013 +0000 | Leo Gordon | Added coloured barchart display option and jobs/data display option (no big data checks, use with care on small examples). 'Pad' is now configurable from JSON. Beware: JSON config options have moved around!
* Thu Jan 10 16:14:06 2013 +0000 | Leo Gordon | injected a padding around the pipeline diagram
* Fri Jan 4 17:03:14 2013 +0000 | Leo Gordon | send the fatal "COULDNT CREATE WORKER" message to stderr instead of stdout
* Fri Jan 4 15:10:47 2013 +0000 | Leo Gordon | added command line options -submit_stdout_file and -submit_stderr_file to peek into submission output/error streams
* Fri Jan 4 14:51:35 2013 +0000 | Leo Gordon | using PERLs File::Path::make_path instead of mkdir-p to create hive_log_dir
* Fri Jan 4 11:03:31 2013 +0000 | Leo Gordon | added a LongMult pipeline diagram in completed state (for easier reference)
* Mon Dec 17 12:13:43 2012 +0000 | Leo Gordon | fixed a bug in computing num_required_workers according to the new rules (thanks Matthieu for reporting)
* Wed Dec 12 14:41:16 2012 +0000 | Leo Gordon | bugfix: correctly checking analysis_capacity (thanks Andy for reporting)
* Wed Dec 12 10:44:01 2012 +0000 | Leo Gordon | bugfix: do not proceed with negative numbers of workers for submission (thanks to Matthieu for reporting)
* Thu Dec 6 11:18:59 2012 +0000 | Leo Gordon | bugfix: -job_limit now works correctly also with respecializing workers
* Thu Dec 6 10:47:07 2012 +0000 | Leo Gordon | fix: 'msg' view now displays the analysis of the job (which is fixed), not that of worker (which may change with time)
* Wed Dec 5 22:25:35 2012 +0000 | Leo Gordon | experimental feature: re-specialization of workers instead of dying from NO_WORK
* Sat Dec 1 19:11:56 2012 +0000 | Leo Gordon | switched to using Limiter class for job_limit and made some related structural changes
* Fri Nov 30 13:47:42 2012 +0000 | Leo Gordon | changed the meaning of 'num_required_workers' to "total estimated number of workers needed for this analysis" ( 'num_running_workers' is now included in it )
* Thu Nov 29 12:21:22 2012 +0000 | Leo Gordon | fresh schema diagram
* Thu Nov 29 11:46:47 2012 +0000 | Leo Gordon | renamed 'job_message' table to 'log_message' and JobMessageAdaptor to LogMessageAdaptor everywhere
* Wed Nov 28 21:40:45 2012 +0000 | Leo Gordon | swapped hive_capacity for analysis_capacity in example PipeConfig files
* Wed Nov 28 21:30:44 2012 +0000 | Leo Gordon | change of default behaviour: hive_capacity is now off by default (=NULL); setting hive_capacity=0 or analysis_capacity=0 stops scheduling AND specialization to a particular analysis
* Wed Nov 28 13:23:48 2012 +0000 | Leo Gordon | cleanup: removed runnable(), output() and parameters() subroutines from Process as no longer used by Compara
* Wed Nov 28 12:21:37 2012 +0000 | Leo Gordon | removed the "compile_module_once" option as the only way to compile modules now is once after specialization
* Tue Nov 27 11:31:00 2012 +0000 | Leo Gordon | secutiry: make sure stringify() always produces perl-parsable structures, so that global settings of Data::Dumper do not affect its results (thanks to Uma and Matthieu for reporting)
---------------------[during EnsEMBL rel.70]----------------------------------------------------------------
* Fri Nov 23 14:26:53 2012 +0000 | Leo Gordon | bugifx: create meadow_capacity limiters whether or not there is a limit
* Thu Nov 22 21:26:37 2012 +0000 | Leo Gordon | added a new per-analysis "analysis_capacity" limiter for cases where users want to limit analyses independently
* Thu Nov 22 16:56:36 2012 +0000 | Leo Gordon | switch the Scheduler to using universal Limiter objects (cleaner code, more precise computation and should allow for expansion)
* Thu Nov 22 14:07:21 2012 +0000 | Leo Gordon | moved pending adjustment out of the main scheduling subroutine, which simplified the logic and improved readability
* Thu Nov 22 17:21:22 2012 +0000 | Leo Gordon | Introduced a new 'NO_ROLE' cause_of_death for failures during specialization (not so much of an error, really!)
* Fri Nov 23 11:16:12 2012 +0000 | Leo Gordon | bugfix: avoid specializing in an otherwise BLOCKED analysis that is temporarily in SYNCHING state (thanks to Kathryn for reporting)
* Wed Nov 21 12:23:11 2012 +0000 | Leo Gordon | (multi-meadow scheduler) restrict the set of analyses that a worker with a given meadow_type can specialize into
* Tue Nov 20 15:35:44 2012 +0000 | Leo Gordon | separated the Scheduler's code into a separate module (not an object yet)
* Tue Nov 20 16:57:23 2012 +0000 | Matthieu Muffato | Merge branch 'master' of git.internal.sanger.ac.uk:/repos/git/ensembl/compara/ensembl-hive
|\
| * Tue Nov 20 12:35:30 2012 +0000 | Leo Gordon | bugfix: if re-running a job that creates a semaphored group, we no longer die (thanks Miguel for reporting)
| * Mon Nov 19 16:25:14 2012 +0000 | Leo Gordon | Added API and schema support for analysis_base.meadow_type / Analysis->meadow_type(), which will be NULL/undef by default
| * Mon Nov 19 15:22:44 2012 +0000 | Leo Gordon | proof of concept: all structures passed into calls and back are now meadow-aware
| * Fri Nov 16 13:44:01 2012 +0000 | Leo Gordon | pass complete valley-wide stats into schedule_workers without filtering
| * Fri Nov 16 10:36:49 2012 +0000 | Leo Gordon | aggregate meadow stats collection in the Valley
* | Mon Nov 19 22:16:26 2012 +0000 | Matthieu Muffato | Merge branch 'master' of git.internal.sanger.ac.uk:/repos/git/ensembl/compara/ensembl-hive
|\ \
| |/
| * Fri Nov 16 23:27:58 2012 +0000 | Leo Gordon | turn Utils::Graph into Configurable and use the same interface to config as Meadow and Valley
* | Sun Nov 18 11:59:06 2012 +0000 | Matthieu Muffato | All the combinations of parameters are tested and cover all possible cases
|/
* Fri Nov 16 15:03:19 2012 +0000 | Leo Gordon | bugfix: no longer leaves CLAIMED jobs after compilation error during specific -job_id execution
* Fri Nov 16 14:29:48 2012 +0000 | Leo Gordon | bugfix: min_batch_time moved to prevent infinite loop in -compile_module_once 0 mode
* Fri Nov 16 12:11:01 2012 +0000 | Leo Gordon | make Valley into Configurable and move SubmitWorkersMax into Valley's context, because it is more "global" than a Meadow
* Fri Nov 16 11:52:51 2012 +0000 | Leo Gordon | concentrate the "Configurable" functionality in one class with the intention to use it wider
* Fri Nov 16 10:48:01 2012 +0000 | Leo Gordon | meadow->signature() is slightly more useful than meadow->toString()
* Thu Nov 15 12:08:11 2012 +0000 | Leo Gordon | removed PendingAdjust option from beekeeper and config file as it never really needs to be unset
* Thu Nov 15 10:37:01 2012 +0000 | Leo Gordon | simplification of the interface: scripts no longer understand --user/--password/--host/--port/--database and require --url instead
* Tue Nov 13 15:19:29 2012 +0000 | Leo Gordon | capture Worker's death message during the new 'SPECIALIZATION' status in job_message/msg (thanks, Thomas!)
* Tue Nov 13 13:07:26 2012 +0000 | Leo Gordon | bugfix: msg view should behave when analysis_id is still NULL
* Tue Nov 13 11:06:01 2012 +0000 | Leo Gordon | feature: jobless workers will now leave module compilation errors in the job_message table (thanks, Kathryn!)
---------------------[before EnsEMBL rel.70]----------------------------------------------------------------
* Mon Nov 12 14:15:40 2012 +0000 | Leo Gordon | updated the release number to 70 in the schema
* Fri Nov 9 13:59:24 2012 +0000 | Leo Gordon | bugfix: worker.log_dir varchar(80) was too limiting, now extended to varchar(255); (thanks, Kathryn!)
* Fri Nov 9 12:05:28 2012 +0000 | Leo Gordon | bugfix: make sure we release claimed jobs from a manually-run worker whose Runnable fails at compilation (thanks, Miguel!)
* Thu Nov 8 10:50:51 2012 +0000 | Leo Gordon | job_count_breakout now also returns the components that go into the breakout_label
* Tue Nov 6 12:55:26 2012 +0000 | Leo Gordon | bugfix: now works on patched schema too
* Tue Nov 6 12:52:34 2012 +0000 | Leo Gordon | substituted fetch_all_failed_jobs() by a more versatile fetch_all_by_analysis_id_status()
* Tue Nov 6 12:23:45 2012 +0000 | Leo Gordon | move job_count_breakout code into AnalysisStats to be called centrally
* Fri Nov 2 14:23:13 2012 +0000 | Leo Gordon | quote and env-substitute runWorker.pl's -url commandline parameter
* Fri Nov 2 15:14:57 2012 +0000 | Leo Gordon | parametrically slow down the LongMult test pipeline using -take_time global parameter
* Fri Nov 2 10:03:39 2012 +0000 | Leo Gordon | cosmetic: removed CVS magic $_Revision and $_Author variables that cause CVS out of sync with Git
* Fri Nov 2 09:59:09 2012 +0000 | Leo Gordon | cosmetic: added a short summary of Git commits to Changelog for CVS-only users
* Thu Nov 1 15:59:55 2012 +0000 | Leo Gordon | bugfix: query in Q::fetch_all_dead_workers_with_jobs() has to reference worker table by its full name
* Thu Nov 1 15:31:36 2012 +0000 | Leo Gordon | clearer display of job_counters in beekeeper's output
* Thu Nov 1 15:16:08 2012 +0000 | Leo Gordon | clearer display of job_counters on the graph; removed misleading and unused remaining_job_count() and cpu_minutes_remaining()
* Thu Nov 1 14:33:42 2012 +0000 | Leo Gordon | Merge branch 'bugfix_greedy_grep'
|\
| * Thu Nov 1 12:05:35 2012 +0000 | Leo Gordon | avoid grepping out lines by patterns potentially present in job_name_prefix
* | Thu Nov 1 12:00:00 2012 +0000 | Leo Gordon | bugfix: only limit buried-in-haste workers to really dead ones
|/
* Wed Oct 31 13:22:46 2012 +0000 | Leo Gordon | fixing permissions of all files in one go
* Wed Oct 31 13:19:14 2012 +0000 | Leo Gordon | Do not expose the password in workers' url by storing it in an environment variable
---------------------[after EnsEMBL rel.69]----------------------------------------------------------------
2012-10-19 15:45 lg4
* sql/tables.sql: better match heavy queries with indices on job
table
2012-10-19 15:43 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm, DBSQL/AnalysisJobAdaptor.pm:
merge reset_and_grab into one subroutine; pre-increment dependent
semaphore if re-running a DONE job; use -force flag for
force-running an individual job
2012-10-19 15:40 lg4
* scripts/beekeeper.pl: propagation of -force flag through
beekeeper.pl
2012-10-17 12:55 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm, Worker.pm,
DBSQL/AnalysisJobAdaptor.pm: moved special-job-reset and
special-job-reclaim into the same call, removed the unnecessary
fetch in between
2012-10-16 12:37 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: cosmetic
changes
2012-10-16 10:42 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm, scripts/beekeeper.pl: try not
to shock the Q::register_worker_death() code with inexistent
W->analysis_id
2012-10-16 10:26 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm, scripts/runWorker.pl: moved
specializaton call into W::run, so that death messages during
specialization could be recorded in W->log_dir
2012-10-15 16:06 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm: print the resource_class_id
of the worker
2012-10-15 16:04 lg4
* scripts/beekeeper.pl: pass either rc_name or logic_name or job_id
from beekeeper.pl to runWorker.pl
2012-10-15 10:44 mm14
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: bugfix:
$analysis instead of $self->analysis
2012-10-15 10:42 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm: set compile_module_once=1 as
default
2012-10-13 12:31 lg4
* modules/Bio/EnsEMBL/Hive/: URLFactory.pm,
PipeConfig/HiveGeneric_conf.pm: allow database names to contain
dashes
2012-10-13 11:02 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm, scripts/runWorker.pl,
sql/patch_2012-10-13.sql, sql/tables.sql, sql/tables.sqlite: if
runWorker.pl is run manually, rc_name may stay NULL in the
database
2012-10-12 21:24 lg4
* docs/: hive_schema.mwb, hive_schema.png: updated schema diagram
with worker.resource_class_id
2012-10-12 17:15 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm, scripts/runWorker.pl,
sql/foreign_keys.mysql, sql/patch_2012-10-12.sql, sql/tables.sql,
sql/tables.sqlite: separating create_new_worker() from
specialize_new_worker()
2012-10-11 12:37 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm,
sql/triggers.mysql, sql/triggers.sqlite: proper counting of
semaphored jobs by triggers and in constructor
2012-10-10 14:45 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm, scripts/runWorker.pl:
refactoring of the Q::create_new_worker() and introduction of
-force flag
2012-10-10 14:36 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm: we should
not leave SYNCHING analysis out (especially if there are not too
many READY analyses)
2012-10-10 14:34 lg4
* modules/Bio/EnsEMBL/Hive/RunnableDB/LongMult/PartMultiply.pm:
slow things down a little
2012-10-09 10:48 lg4
* docs/hive_schema.mwb, docs/hive_schema.png,
sql/foreign_keys.mysql: added a DF-to-DF foreign key and
refreshed the diagram
2012-10-09 10:25 lg4
* sql/tables.sqlite: bugfix: forgot to add semaphored_job_count to
SQLite schema, now included
2012-10-09 10:22 lg4
* sql/: patch_2012-10-08.sql, tables.sql, tables.sqlite: turned two
unique keys into primary keys (needed by BaseAdaptor)
2012-10-08 16:06 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm: allow the
batch_size to be updated via
$analysis_stats_adaptor->update($stats);
2012-10-08 12:17 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm, scripts/runWorker.pl: removed
the input_id functionality from runWorker as both redundant
(standaloneJob) and probably not working
2012-10-08 12:13 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: those
"return" statements would have never worked anyway, so I removed
them
2012-10-05 16:14 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: extend the
param_init() of the garbage-collected jobs to include
analysis->parameters() for template substitution (still limited!)
2012-10-05 14:14 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm, sql/patch_2012-10-06.sql,
sql/tables.sql, sql/tables.sqlite: cause_of_death="" no longer
used for decision making, cause_of_death IS NULL by default and
FATALITY renamed UNKNOWN for clarity
2012-10-05 10:09 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm: fetch_failed_workers() is
dropped as no longer used, get_hive_current_load() cosmetically
touched
2012-10-04 16:47 lg4
* modules/Bio/EnsEMBL/Hive/AnalysisStats.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
sql/patch_2012-10-05.sql, sql/tables.sql, sql/tables.sqlite:
EMPTY state added and definitions of READY and WORKING made more
intuitive
2012-10-04 15:45 lg4
* modules/Bio/EnsEMBL/Hive/AnalysisStats.pm: bugfix: a typo
2012-10-04 15:39 lg4
* modules/Bio/EnsEMBL/Hive/AnalysisStats.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm,
scripts/beekeeper.pl, sql/patch_2012-10-04.sql, sql/tables.sql,
sql/tables.sqlite, sql/triggers.mysql, sql/triggers.sqlite:
introduced semaphored_job_count, renamed
unclaimed_job_count-->ready_job_count, changed reporting, fixed
hive_capacity=0
2012-10-03 14:55 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm,
DBSQL/AnalysisStatsAdaptor.pm: common denominator for
schedule_workers and specialize_new_worker
2012-10-03 14:11 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm, DBSQL/AnalysisAdaptor.pm:
Fetching data via AnalysisAdaptor allows to print logic_names of
failed analyses
2012-10-03 11:09 lg4
* scripts/runWorker.pl: print stats if could not create worker
anyway, but do not sync in the end (too cryptic)
2012-10-03 10:51 lg4
* modules/Bio/EnsEMBL/Hive/: DBSQL/AnalysisJobAdaptor.pm,
PipeConfig/HiveGeneric_conf.pm: renamed -input_job_id to
-prev_job_id to be in sync with other names
2012-10-02 16:47 lg4
* docs/: hive_schema.mwb, hive_schema.png: updated schema diagrams
2012-10-02 16:18 lg4
* modules/Bio/EnsEMBL/Hive/Analysis.pm,
modules/Bio/EnsEMBL/Hive/AnalysisStats.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
modules/Bio/EnsEMBL/Hive/Utils/Graph.pm,
sql/patch_2012-10-02.sql, sql/tables.sql, sql/tables.sqlite:
moved failed_job_tolerance, max_retry_count, can_be_empty and
priority columns from analysis_stats to analysis_base
2012-10-02 14:56 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: bugfix: do
not forget PRE_- and POST_CLEANUP states
2012-10-02 13:00 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: bugfix:
changed the order of the atomic SEMAPHORED->READY state&counter
UPDATE so that it works as intended in SQLite as well
2012-10-02 12:17 lg4
* sql/tables.sql: added a fake default to last_update field
(required by stricter MySQL setup of Vega)
2012-10-02 11:48 lg4
* modules/Bio/EnsEMBL/Hive/AnalysisStats.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm,
sql/tables.sql, sql/tables.sqlite: added specific defaults into
analysis_stats and analysis_stats_monitor; re-ordered the fields
for easier navigation
2012-10-01 15:58 lg4
* modules/Bio/EnsEMBL/Hive/: Worker.pm, DBSQL/BaseAdaptor.pm:
bugfix: fetch_by_dbID should work now (thanks to ChuangKee and
Miguel)
2012-10-01 12:53 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm: bugfix:
primary_key_constraint now works (thanks to Miguel!)
2012-09-28 11:01 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: bugfix:
typo fixed, thanks to Miguel for pointing out!
2012-09-27 16:48 lg4
* modules/Bio/EnsEMBL/Hive/: AnalysisStats.pm,
DBSQL/AnalysisStatsAdaptor.pm, PipeConfig/HiveGeneric_conf.pm:
make AnalysisStats a rearrangeable EnsEMBL-style constructor, a
proper store method and other preparations
2012-09-27 15:29 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm:
simplification of DYNAMIC hive_capacity update code
2012-09-27 12:03 lg4
* scripts/cmd_hive.pl: retired the cmd_hive.pl script; likely not
working and duplicating functionality of more flexible PipeConfig
2012-09-27 10:50 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: bugfix:
also release jobs that were in PRE_CLEANUP or POST_CLEANUP states
2012-09-26 15:03 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm, DBSQL/BaseAdaptor.pm:
switched Queen to become descendent of Hive::DBSQL::ObjectAdaptor
and removed _generic_fetch from it
2012-09-26 12:31 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm, Worker.pm: rearranged
Worker's storable getters/setters, introduced and used a proper
rearranging new() method
2012-09-26 11:27 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm, Worker.pm: Worker doesnt
really need its own reference to db (can go via adaptor)
2012-09-25 16:20 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm: bugfix: GROUP BY now includes
a proper prefix of the index
2012-09-25 16:04 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/LongMult_conf.pm,
sql/patch_2012-09-25.sql, sql/tables.sql, sql/tables.sqlite:
Dropped 'BLOCKED' job status and introduced 'SEMAPHORED' status
that is maintained in sync with semaphore_counts; less confusing
and more efficient (with new 3-part index)
2012-09-25 12:32 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm, scripts/beekeeper.pl,
scripts/runWorker.pl, sql/patch_2012-09-24.sql, sql/tables.sql,
sql/tables.sqlite: record each Workers log_dir in the database;
simplified the log_dir code and renamed cmdline options
accordingly
2012-09-21 22:16 lg4
* docs/hive_schema.mwb, docs/hive_schema.png,
modules/Bio/EnsEMBL/Hive/Analysis.pm,
modules/Bio/EnsEMBL/Hive/AnalysisStats.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
scripts/lsf_report.pl, sql/foreign_keys.mysql,
sql/patch_2012-09-21.sql, sql/tables.sql, sql/tables.sqlite:
moved resource_class_id from analysis_stats and
analysis_stats_monitor to analysis_base
2012-09-21 14:46 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/: AnalysisJobAdaptor.pm,
AnalysisStatsAdaptor.pm: fetch_all never seems to be executed for
these adaptors
2012-09-21 09:34 lg4
* modules/Bio/EnsEMBL/Hive/Meadow/LSF.pm: bugfix: better parsing of
the LSF-job-name
2012-09-20 15:56 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm,
DBSQL/AnalysisStatsAdaptor.pm: optimization: worker should not
sync analyses it is not ready to run
2012-09-20 11:51 lg4
* modules/Bio/EnsEMBL/Hive/Meadow.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LOCAL.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LSF.pm, scripts/beekeeper.pl:
replaced internal rc_id by rc_name in the Meadow code and in most
of the Scheduler; needs testing
2012-09-20 11:44 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm: allow JOIN to
appear in the constraint and act wisely - so we do not need extra
complicated syntax for joining
2012-09-07 11:20 lg4
* modules/Bio/EnsEMBL/Hive/Extensions.pm: not ready yet to scrap
the "Runnable" support
2012-09-07 10:29 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm: (patch offered by Matthieu)
Allow the Job to kill the Worker even on succecss
2012-09-05 15:07 mm14
* modules/Bio/EnsEMBL/Hive/RunnableDB/DatabaseDumper.pm: Updated
the list of eHive tables
2012-09-05 15:00 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm, DBSQL/AnalysisJobAdaptor.pm,
DBSQL/AnalysisStatsAdaptor.pm: these methods are already defined
in the parent class
2012-09-05 10:33 lg4
* modules/Bio/EnsEMBL/Hive/: AnalysisStats.pm,
DBSQL/AnalysisJobAdaptor.pm, DBSQL/AnalysisStatsAdaptor.pm:
trimmed the commented-out 'use' statements
2012-09-04 17:07 lg4
* modules/Bio/EnsEMBL/Hive/Extensions.pm: slimmed down the
Extensions module a bit (valuable code already moved into
Hive::Analysis)
2012-09-04 17:02 lg4
* docs/hive_schema.mwb, docs/hive_schema.png,
modules/Bio/EnsEMBL/Hive.pm,
modules/Bio/EnsEMBL/Hive/Analysis.pm,
modules/Bio/EnsEMBL/Hive/AnalysisCtrlRule.pm,
modules/Bio/EnsEMBL/Hive/AnalysisStats.pm,
modules/Bio/EnsEMBL/Hive/DataflowRule.pm,
modules/Bio/EnsEMBL/Hive/Process.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisAdaptor.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
modules/Bio/EnsEMBL/Hive/Utils/Graph.pm, scripts/cmd_hive.pl,
scripts/lsf_report.pl, sql/foreign_keys.mysql,
sql/patch_2012-09-04.sql, sql/procedures.mysql,
sql/procedures.sqlite, sql/tables.sql, sql/tables.sqlite:
substituted the overloaded legacy 'analysis' table by a slimmer
'analysis_base'
2012-09-04 10:09 lg4
* scripts/beekeeper.pl: actually switch to using rc_name in the
workers commandline
2012-09-03 12:26 lg4
* scripts/beekeeper.pl: make sure beekeeper reports the same
scheduling plans both when it is actually scheduling and in
"reporting" mode
2012-09-03 12:23 lg4
* scripts/lsf_report.pl: adding rc_name to the lsf_report
2012-09-03 12:21 lg4
* modules/Bio/EnsEMBL/Hive/Meadow/LOCAL.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LSF.pm, scripts/beekeeper.pl:
rc_name support in the beekeeper
2012-09-03 12:20 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm, scripts/runWorker.pl: rc_name
support in the Worker
2012-08-29 09:51 lg4
* modules/Bio/EnsEMBL/Hive/Params.pm: a typo in perldoc
2012-08-28 10:17 lg4
* modules/Bio/EnsEMBL/Hive/: DBSQL/BaseAdaptor.pm, Queen.pm: this
diagnostic information is no longer needed
2012-08-28 10:05 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm: Scheduler should explain that
workers are not added because of the pending ones
2012-08-27 14:10 mm14
* modules/Bio/EnsEMBL/Hive/RunnableDB/DatabaseDumper.pm: eHive
tables are always included unless exclude_ehive is defined
2012-08-25 21:09 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm: make
sure default is in the beginning of the list
2012-08-25 10:58 mm14
* modules/Bio/EnsEMBL/Hive/RunnableDB/DatabaseDumper.pm: Updated
the eHive table list + fixed typo
2012-08-24 15:49 lg4
* modules/Bio/EnsEMBL/Hive/: AnalysisJob.pm, DataflowRule.pm,
Process.pm, Queen.pm, ResourceClass.pm, Worker.pm: inherit
Job,Worker,DFR,RC from Bio::EnsEMBL::Storable, reuse some code
2012-08-24 15:44 lg4
* sql/tables.sql: starting the rel69...
2012-08-24 14:38 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm: bugfix: make sure there is at
least a number in the query (reported by Matthieu)
2012-08-23 12:01 lg4
* modules/Bio/EnsEMBL/Hive/: ResourceClass.pm,
ResourceDescription.pm: renamed to_string into toString for
uniformity
2012-08-23 10:45 mm14
* modules/Bio/EnsEMBL/Hive/RunnableDB/DatabaseDumper.pm: Added a
"skip_dump" parameter to ease the restoration of a dump
2012-08-17 15:52 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm: trying to be more careful
with references; release jobs' parameters earlier
2012-08-16 12:16 mm14
* modules/Bio/EnsEMBL/Hive/RunnableDB/DatabaseDumper.pm: Can copy a
database to another database
2012-08-16 12:12 lg4
* modules/Bio/EnsEMBL/Hive/: Process.pm, Worker.pm: change
suggested by Matthieu to avoid crashing if the temp_directory has
already been deleted by Runnable
2012-08-14 11:57 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
scripts/beekeeper.pl, scripts/cmd_hive.pl,
scripts/generate_graph.pl, scripts/runWorker.pl: switch to
module->new() notation everywhere, to simplify text searches
2012-08-03 16:31 lg4
* scripts/ehive_unblock.pl: no longer used as individual jobs are
no longer specifically blocked
2012-08-03 16:22 lg4
* modules/Bio/EnsEMBL/Hive/Process.pm: removed honeycomb support
because it is no longer used by Compara modules
2012-08-03 10:36 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/JobMessageAdaptor.pm: hopefully
will fix the "was not locked with LOCK TABLES" error message
---------------------[before EnsEMBL rel.69]----------------------------------------------------------------
2012-08-01 14:23 lg4
* scripts/: runWorker.pl, standaloneJob.pl: removed the alternative
"nowrite" spelling to simplify interface
2012-07-31 17:01 lg4
* modules/Bio/EnsEMBL/Hive/Process.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm, scripts/standaloneJob.pl:
made it possible for a standaloneJob to provide Runnables with a
functional worker_temp_directory()
2012-07-31 16:15 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm: moved life_cycle() from
Worker.pm into Process.pm and now also calling it from
standaloneJob.pl (actually removed from Worker)
2012-07-31 16:13 lg4
* modules/Bio/EnsEMBL/Hive/Process.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm, scripts/standaloneJob.pl:
moved life_cycle() from Worker.pm into Process.pm and now also
calling it from standaloneJob.pl
2012-07-25 16:30 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm: only add partial timers'
measurement if the job completed successfully
2012-07-24 16:48 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: if running
a worker with a specific job_id, the status is set to READY, but
the retry_count is set depending on whether PRE_CLEANUP is needed
or not
2012-07-24 16:17 lg4
* modules/Bio/EnsEMBL/Hive/Process.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm,
modules/Bio/EnsEMBL/Hive/RunnableDB/FailureTest.pm,
sql/patch_2012-07-23.sql, sql/tables.sql: added two states,
PRE_CLEANUP (conditional) and POST_CLEANUP (unconditional) to the
life cycle of the Job
2012-07-23 16:49 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm,
modules/Bio/EnsEMBL/Hive/RunnableDB/FailureTest.pm,
scripts/standaloneJob.pl, sql/patch_2012-07-22.sql,
sql/tables.sql, sql/tables.sqlite: At last rename GET_INPUT into
FETCH_INPUT for consistency between the schema and the code (it
seems to be harder to patch all the accumulated code)
2012-07-23 12:13 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm, scripts/beekeeper.pl,
scripts/runWorker.pl: added -compile_modules_once flag to test
the new (slightly faster and more logical) approach
2012-07-16 17:54 mm14
* scripts/lsf_report.pl: rc_id renamed to resource_class_id
2012-07-03 12:06 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm,
scripts/beekeeper.pl: fixed reset_failed_jobs/reset_all_jobs and
removed remove_analysis_id
2012-06-29 14:20 lg4
* docs/hive_schema.mwb, docs/hive_schema.png,
modules/Bio/EnsEMBL/Hive/AnalysisStats.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/ResourceDescription.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
scripts/beekeeper.pl, sql/foreign_keys.mysql,
sql/patch_2012-06-29.sql, sql/tables.sql, sql/tables.sqlite:
replaced rc_id by resource_class_id throughout the schema and
added the foreign keys on resource_class_id
2012-06-29 09:41 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm: support
'default' as the default resource class (if none is defined) and
create the 'default' rc even if not defined in PipeConfig
2012-06-27 16:17 lg4
* modules/Bio/EnsEMBL/Hive/: AnalysisJob.pm,
DBSQL/AnalysisJobAdaptor.pm: simplified logic that controls how
semaphores are propagates (preparing for semaphore escaping rule
support)
2012-06-26 20:53 lg4
* sql/tables.sql: to please MySQL Workbench (does not like boolean,
does not like leading newlines)
2012-06-26 17:02 mm14
* sql/procedures.mysql: resource_description is still there
2012-06-26 16:28 mm14
* sql/procedures.mysql: Added resource_class to the list of removed
tables
---------------------[during EnsEMBL rel.68]----------------------------------------------------------------
2012-06-26 12:58 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm: fixed a bug where job failed
in COMPILATION state were still set to DONE status
2012-06-26 11:22 lg4
* modules/Bio/EnsEMBL/Hive/Meadow.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm, scripts/beekeeper.pl: slightly
improved output
2012-06-26 11:01 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm, scripts/beekeeper.pl: fixed
and cleaned up the code that outputs a list of workers
2012-06-25 16:00 lg4
* scripts/lsf_report.pl: untested version that corrects the
max(dead) by one minute to include the stats on the last worker
2012-06-22 12:51 mm14
* modules/Bio/EnsEMBL/Hive/Utils/Graph.pm: With the
"DisplayStretched" option on: now draws the mid-point of the
semaphores next to the boxes instead of under them
2012-06-22 11:51 mm14
* scripts/lsf_report.pl: Now accepts two parameters on the command
line: "start_date" and "end_date"
2012-06-22 11:44 mm14
* scripts/lsf_report.pl: bugfix: now works if the lines in the
bacct output do not start with a space
2012-06-19 16:12 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/: AnalysisJobAdaptor.pm,
AnalysisStatsAdaptor.pm: unnecessary uses
2012-06-15 16:43 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Graph.pm: making
_allocate_to_subgraph() a member function allows not to pass
$config as a parameter every time
2012-06-13 17:13 mm14
* hive_config.json, modules/Bio/EnsEMBL/Hive/Utils/Graph.pm: Added
an option to duplicate the tables and include them into their
parent boxes in the graphical output of the pipeline
2012-06-11 12:01 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm: removed
commented lines
2012-06-10 10:30 mm14
* modules/Bio/EnsEMBL/Hive/RunnableDB/DatabaseDumper.pm: Dies if
the db driver is not mysql + fixed a bug that prevented
"table_list" to be parsed
2012-06-08 20:28 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Graph.pm: group boxes based on
funnel rule's midpoint (more boxes)
2012-06-08 16:54 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm: fixed a
bug that looked like a feature
2012-06-08 15:46 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm: a
rc_id-less format of resource_classes() supported now; DO NOT
MIX!!!
2012-06-08 14:50 lg4
* sql/tables.sqlite: updated sqlite schema: added resource_class
and modified resource_description
2012-06-08 14:38 lg4
* sql/patch_2012-06-08.sql,
modules/Bio/EnsEMBL/Hive/ResourceDescription.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
sql/tables.sql: splitting the resource_description table into two
2012-06-08 11:41 lg4
* modules/Bio/EnsEMBL/Hive/: ResourceClass.pm, DBSQL/DBAdaptor.pm,
DBSQL/ResourceClassAdaptor.pm: adding ResourceClass and
ResourceClassAdaptor
2012-06-08 11:03 lg4
* scripts/beekeeper.pl: print meadow->toString instead of
meadow->type
2012-06-06 21:07 lg4
* hive_config.json, modules/Bio/EnsEMBL/Hive/Utils/Graph.pm:
reorganized the "Graph" part of the config file
2012-06-01 16:00 lg4
* scripts/lsf_report.pl, sql/procedures.mysql: moved creation of
both 'lsf_report' table and 'lsf_usage' view into
scripts/lsf_report.pl
2012-06-01 15:40 lg4
* sql/procedures.mysql: added an SQL view over analysis, worker and
lsf_report tables to show analysis-wide resource usage stats
2012-06-01 15:34 mm14
* modules/Bio/EnsEMBL/Hive/RunnableDB/DatabaseDumper.pm: New
Runnable to create a snapshot of a database
2012-05-31 17:09 lg4
* hive_config.json, modules/Bio/EnsEMBL/Hive/Meadow.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/Valley.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LOCAL.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LSF.pm, scripts/beekeeper.pl:
moved
submit_workers_max/pending_adjust/total_workers_max/meadow_options
into Config, but they are still configurable from BK's
commandline via config_set(); lots of code cleanup on the way
2012-05-31 16:09 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Config.pm: now with a setter
function
2012-05-31 11:51 lg4
* modules/Bio/EnsEMBL/Hive/Meadow/LOCAL.pm: only take the first
name, ignore the domain name altogether
2012-05-31 09:32 lg4
* scripts/: beekeeper.pl, runWorker.pl: removed references to old
config file as obsolete
2012-05-31 09:27 lg4
* scripts/beekeeper.pl: moved run_job_id into a separate variable
for clarity
2012-05-30 14:48 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Graph.pm: updated POD about
new()'s arguments
2012-05-30 14:38 lg4
* hive_config.json, modules/Bio/EnsEMBL/Hive/Utils/Graph.pm:
SemaphoreBoxes colours moved under "Colours" section
2012-05-30 14:30 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Config.pm,
modules/Bio/EnsEMBL/Hive/Utils/Graph.pm,
scripts/generate_graph.pl: A personal ~/.hive_config.json will be
merged in by default (overriding system defaults)
2012-05-30 12:25 lg4
* scripts/generate_graph.pl: removed reference to the deleted
Util::Graph::Config
2012-05-30 12:16 lg4
* hive_config.json, modules/Bio/EnsEMBL/Hive/Utils/Graph.pm,
modules/Bio/EnsEMBL/Hive/Utils/GraphViz.pm: allow configuring
boxes' colourscheme/offset from hive_config.json
2012-05-30 12:00 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Graph.pm,
scripts/generate_graph.pl: switching to using the new
configuration file+parser
2012-05-30 11:58 lg4
* hive_config.json, modules/Bio/EnsEMBL/Hive/Utils/Config.pm: a new
JSON-based configuration file and parser
---------------------[before EnsEMBL rel.68]----------------------------------------------------------------
2012-05-28 16:18 lg4
* README, modules/Bio/EnsEMBL/Hive/ResourceDescription.pm: schema
change to allow any short string for meadow_type
2012-05-28 14:10 lg4
* sql/: patch_2012-05-28.sql, tables.sql: schema change to allow
any short string for meadow_type
2012-05-23 15:27 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm: a Valley-wide (potentially
multi-meadow) garbage collector
2012-05-23 15:14 lg4
* scripts/beekeeper.pl: bugfix:
schedule_workers_resync_if_necessary should now be run with a
$valley argument
2012-05-23 12:07 lg4
* modules/Bio/EnsEMBL/Hive/Valley.pm, scripts/beekeeper.pl:
pipeline_name now gets propagated to all meadows of the Valley
(preparatory)
2012-05-23 11:12 lg4
* modules/Bio/EnsEMBL/Hive/: Valley.pm, Meadow/LOCAL.pm,
Meadow/LSF.pm: reuse the meadow->name() method to check for
availability [cleanup]
2012-05-23 11:11 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm, scripts/beekeeper.pl: pass in
a Valley instead of the current_meadow (preparatory)
2012-05-22 17:58 lg4
* modules/Bio/EnsEMBL/Hive/Meadow/LOCAL.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LSF.pm, scripts/beekeeper.pl:
kill-worker-process-by-worker-id: simplified specific Meadow code
by moving general checks out of them
2012-05-22 17:54 lg4
* modules/Bio/EnsEMBL/Hive/Valley.pm: the Meadow hash is now by
type, so no need to iterate to find Meadow-by-Worker
2012-05-22 13:01 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm, scripts/beekeeper.pl:
untangling a bit. Queen does not need to re-sync and beekeeper
does not need to fetch
2012-05-22 11:50 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm, scripts/beekeeper.pl:
simplified output interface from schedule_workers and
schedule_workers_resync_if_necessary
2012-05-21 22:47 lg4
* modules/Bio/EnsEMBL/Hive/Meadow.pm,
modules/Bio/EnsEMBL/Hive/Valley.pm, scripts/beekeeper.pl: valley
now contains available meadow objects, not classes; beekeeper
contains corrected algorithm for killing a worker
2012-05-18 15:12 lg4
* modules/Bio/EnsEMBL/Hive/Meadow.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/Valley.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LOCAL.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LSF.pm, scripts/runWorker.pl,
sql/patch_2012-05-18.sql, sql/tables.sql, sql/tables.sqlite:
added schema and API support for meadow_name
2012-05-18 14:00 lg4
* modules/Bio/EnsEMBL/Hive/Valley.pm: fixed a typo bug
2012-05-17 10:33 lg4
* scripts/beekeeper.pl: rename meadow_name to meadow_type to match
the rest of the repository, before it is too late
2012-05-12 08:47 lg4
* modules/Bio/EnsEMBL/Hive/Meadow.pm,
modules/Bio/EnsEMBL/Hive/Valley.pm, scripts/runWorker.pl: moved
meadow identification code to Valley.pm
2012-05-11 16:40 lg4
* scripts/beekeeper.pl: moved the "meadow-collection" code into a
separate class called "Valley"
2012-05-11 16:39 lg4
* modules/Bio/EnsEMBL/Hive/Valley.pm: a new class to represent a
collection of available Meadows
2012-05-10 16:27 lg4
* scripts/: beekeeper.pl, runWorker.pl: removed obsolete
-maximise_concurrency and -batch_size options from both scripts
2012-05-09 12:01 lg4
* modules/Bio/EnsEMBL/Hive/Meadow.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LOCAL.pm, scripts/beekeeper.pl:
bugfix+feature: -local_cpus renamed into -total_workers_max and
so made available for any meadow (not just LOCAL). Plus some
renames
2012-05-08 17:50 lg4
* scripts/beekeeper.pl: make beekeeper more Meadow-agnostic and
allow it to automatically find alternative Meadow modules in the
INC list
2012-05-08 17:49 lg4
* modules/Bio/EnsEMBL/Hive/Meadow/: LOCAL.pm, LSF.pm: check the
availability of this Meadow on the given machine
2012-05-08 17:48 lg4
* modules/Bio/EnsEMBL/Hive/Utils.pm: new function for finding all
modules in a "directory" across the whole INC list
2012-05-02 15:59 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/: AnalysisCtrlRuleAdaptor.pm,
DataflowRuleAdaptor.pm: removed create_rule() method that is no
longer used
2012-05-02 15:54 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm: explicit
new->store rules; retiring create_rule(); switch to using
toString()
2012-05-02 15:53 lg4
* modules/Bio/EnsEMBL/Hive/: AnalysisCtrlRule.pm, DataflowRule.pm:
switch to using uniform toString() diagnostic method
2012-05-02 12:10 lg4
* modules/Bio/EnsEMBL/Hive/: DataflowRule.pm,
DBSQL/DataflowRuleAdaptor.pm: move input_id_template
stringification into DFR class
2012-05-01 17:04 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisCtrlRuleAdaptor.pm:
remove_by_condition_analysis_url() is no longer used by Compara,
so has been removed
2012-05-01 16:37 lg4
* modules/Bio/EnsEMBL/Hive/: Process.pm, Worker.pm: a Process does
not need a reference to the Queen
2012-05-01 16:30 lg4
* modules/Bio/EnsEMBL/Hive/Extensions.pm: analyze_tables() does not
seem to be used anymore
2012-05-01 15:55 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm: bugfix: make sure
runtime_msec is stored even when a job dies
2012-05-01 10:58 lg4
* modules/Bio/EnsEMBL/Hive/: Extensions.pm, Process.pm: no longer
used
2012-04-23 23:04 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Graph.pm,
modules/Bio/EnsEMBL/Hive/Utils/GraphViz.pm,
scripts/generate_graph.pl: code for showing semaphores as nested
boxes
---------------------[after EnsEMBL rel.67]----------------------------------------------------------------
2012-03-27 12:22 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/: AnalysisJobAdaptor.pm,
BaseAdaptor.pm: a typo in comments
2012-03-26 14:59 mm14
* sql/tables.sqlite: schema_version=67
2012-03-26 14:45 mm14
* sql/tables.sql: schema_version=67
2012-03-20 11:06 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: code
optimization suggested by Matthieu
2012-03-19 17:15 lg4
* modules/Bio/EnsEMBL/Hive/: AnalysisCtrlRule.pm, AnalysisJob.pm,
AnalysisStats.pm, DataflowRule.pm, NakedTable.pm,
ResourceDescription.pm: weaken the link back from the object back
to the adaptor
2012-03-07 15:27 lg4
* modules/Bio/EnsEMBL/Hive/RunnableDB/FastaFactory.pm: added
support for reading compressed files
2012-03-07 14:41 lg4
* modules/Bio/EnsEMBL/Hive/: PipeConfig/FastaSplitter_conf.pm,
RunnableDB/FastaFactory.pm: a Bio::Seq example factory Runnable
and a matching PipeConfig file
2012-03-01 10:31 lg4
* modules/Bio/EnsEMBL/Hive/Worker.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm,
sql/tables.sql, sql/tables.sqlite: improved STDOUT/STDERR
redirection into files; removal of job logs on success
2012-03-01 10:29 lg4
* modules/Bio/EnsEMBL/Hive/Utils/RedirectStack.pm: a special module
to deal with stacks of filehandle redirection
2012-02-24 15:59 lg4
* scripts/beekeeper.pl: pass debug level parameter from
beekeeper.pl to runWorker.pl
2012-02-23 13:52 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm: bugfix: only store
values that have been set - avoid overriding defaults
2012-02-20 16:04 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Graph.pm: attempt to display each
funnel below its fan
2012-02-16 16:39 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Graph.pm: diagram tool no longer
generates unnecessarily broken edges
2012-02-15 11:26 lg4
* scripts/lsf_report.pl: restrict to DEAD workers only
2012-02-15 11:14 lg4
* scripts/lsf_report.pl: documentation and better user interface
(dumping and undumping supported)
2012-02-14 16:56 lg4
* scripts/lsf_report.pl: turned mem and swap into numeric columns;
careful with units!
2012-02-14 16:40 lg4
* scripts/lsf_report.pl: post-mortem loader of worker memory usage
information from the LSF
2012-02-14 10:36 lg4
* modules/Bio/EnsEMBL/Hive/DependentOptions.pm: Pipeline parameters
cannot take undefined values. Warn and force into 0
---------------------[after EnsEMBL rel.66]----------------------------------------------------------------
2012-01-31 10:58 lg4
* sql/triggers.mysql: an optimization: do not touch analysis_stats
when job.status or job.analysis is not changing
2012-01-20 16:33 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
sql/tables.sql: changes for rel.66
2011-12-08 13:07 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/LongMult_conf.pm: checked in
by mistake last time; took back the changes now
2011-12-08 12:08 lg4
* modules/Bio/EnsEMBL/Hive/AnalysisStats.pm,
modules/Bio/EnsEMBL/Hive/Meadow.pm,
modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm,
modules/Bio/EnsEMBL/Hive/Meadow/LSF.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/LongMult_conf.pm,
scripts/beekeeper.pl, scripts/runWorker.pl,
sql/patch_2011-12-08.sql, sql/tables.sql, sql/tables.sqlite,
sql/triggers.mysql, sql/triggers.sqlite: Removed
maximise_concurrency and added analysis_stats.priority to guide
the scheduler; improved scheduler and LSF meadow
2011-11-29 17:49 lg4
* modules/Bio/EnsEMBL/Hive/AnalysisJob.pm,
modules/Bio/EnsEMBL/Hive/DataflowRule.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/DataflowRuleAdaptor.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/LongMult_conf.pm,
modules/Bio/EnsEMBL/Hive/Utils/Graph.pm,
sql/patch_2011-11-29.sql, sql/tables.sql, sql/tables.sqlite: An
extension to the dataflow-rule-driven semaphores ('2->A', '3->A'
and 'A->1' notation)
2011-11-29 12:59 lg4
* modules/Bio/EnsEMBL/Hive/RunnableDB/JobFactory.pm: removed
'sema_fan_branch_code' parameter since you can now set up a
semaphored group via PipeConfig' language
2011-11-28 09:57 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm: no need to check for
semaphores when state is already DONE or PASSED_ON
2011-11-28 09:57 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm: making sure
semaphores are correctly propagated through gc_dataflow and
PASSED_ON state
2011-11-25 10:22 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm: allow
more than one input_id_template per analysis
2011-11-24 20:12 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/DataflowRuleAdaptor.pm: bugfix:
funnel_branch is no longer initialized to 1 when undef
2011-11-24 20:05 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm: bugfix: correct counting of
total_job_number in non-trigger mode
2011-11-24 12:37 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/DataflowRuleAdaptor.pm: bugfix -
branch_name_2_code should return 1 on undef
2011-11-23 17:05 lg4
* modules/Bio/EnsEMBL/Hive/Utils/: Graph.pm, Graph/Config.pm: show
the dataflow-generated semaphores on the diagram
2011-11-23 15:52 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/: BaseAdaptor.pm,
ObjectAdaptor.pm: bugfix: object adaptor now correctly
reconstructs dbID
2011-11-23 12:23 lg4
* modules/Bio/EnsEMBL/Hive/: PipeConfig/LongMult_conf.pm,
PipeConfig/SemaLongMult_conf.pm, RunnableDB/LongMult/README,
RunnableDB/LongMult/SemaStart.pm: merge the two ways of running
the LongMult pipeline into one
2011-11-23 11:57 lg4
* modules/Bio/EnsEMBL/Hive/AnalysisJob.pm,
modules/Bio/EnsEMBL/Hive/DataflowRule.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisJobAdaptor.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/DataflowRuleAdaptor.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/LongMult_conf.pm,
sql/patch_2011-11-23.sql, sql/tables.sql, sql/tables.sqlite:
integrated semaphored fans/funnels into dataflow rules
2011-11-22 14:47 lg4
* modules/Bio/EnsEMBL/Hive/AnalysisStats.pm: this value was
returned but never used
2011-11-21 16:44 lg4
* modules/Bio/EnsEMBL/Hive/: AnalysisStats.pm, Queen.pm, Worker.pm,
DBSQL/AnalysisJobAdaptor.pm: remove the per-worker batch_size
method
2011-11-21 16:40 lg4
* scripts/: beekeeper.pl, runWorker.pl: remove the per-worker
batch_size flag from scripts
2011-11-17 14:56 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm,
modules/Bio/EnsEMBL/Hive/Worker.pm,
modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm,
sql/triggers.mysql, sql/triggers.sqlite: make num_running_workers
updatable by triggers + better updates during worker check-in
2011-11-10 14:30 lg4
* modules/Bio/EnsEMBL/Hive/: AnalysisStats.pm,
DBSQL/AnalysisStatsAdaptor.pm: these four methods were neither
used by Hive nor by Compara code
2011-11-04 12:05 lg4
* docs/eHive_install_usage.txt,
modules/Bio/EnsEMBL/Hive/DBSQL/BaseAdaptor.pm: DBI with versions
older than 1.6 are not supported
2011-10-19 09:54 db8
* modules/Bio/EnsEMBL/Hive/Meadow/LSF.pm, scripts/beekeeper.pl:
WGA/Projection used for CHIMP2.1.4
2011-10-15 10:20 lg4
* sql/: tables.sql, tables.sqlite: release 65
2011-09-23 11:53 lg4
*
modules/Bio/EnsEMBL/Hive/PipeConfig/RunListOfCommandsOnFarm_conf.pm:
An example pipeline that turns lines of a file into jobs and runs
them on the farm
2011-09-20 21:15 lg4
* modules/Bio/EnsEMBL/Hive/Process.pm: extend for other schema
types
2011-09-09 09:57 lg4
* sql/tables.sql: analysis_data may be overcrowded with inserts
during dataflow with input_id longer than 255 characters
2011-09-05 17:18 lg4
* docs/long_mult_example_pipeline.txt: couple of typos
2011-09-05 16:50 lg4
* docs/eHive_install_usage.txt: checkout seems to work better than
export
2011-09-01 16:12 lg4
* modules/Bio/EnsEMBL/Hive/: DependentOptions.pm,
PipeConfig/ApplyToDatabases_conf.pm,
PipeConfig/FailureTest_conf.pm,
PipeConfig/FileZipperUnzipper_conf.pm,
PipeConfig/HiveGeneric_conf.pm, PipeConfig/LongMult_conf.pm,
PipeConfig/SemaLongMult_conf.pm,
PipeConfig/TableDumperZipper_conf.pm: Incorporate ENV hash into
the tree of possible options in order to be able to "require" a
value. And a bit of config inheritance cleanup.
2011-08-25 20:37 lg4
* modules/Bio/EnsEMBL/Hive/: Process.pm, RunnableDB/JobFactory.pm,
RunnableDB/MySQLTransfer.pm, RunnableDB/SqlCmd.pm,
RunnableDB/SystemCmd.pm: Switching from DBI to DBConnection;
data_dbc() as the main focus point; standaloneJob.pl examples of
basic building blocks
2011-08-18 15:28 lg4
* modules/Bio/EnsEMBL/Hive/RunnableDB/: JobFactory.pm, SqlCmd.pm,
SystemCmd.pm: show query/cmd/filename when debug is on
2011-08-18 15:23 lg4
* modules/Bio/EnsEMBL/Hive/Utils/Graph.pm: sqlite databases do not
have a host name, so nothing to display here
2011-08-15 10:58 lg4
* sql/: tables.sql, tables.sqlite: the schema did not allow more
than one job_message per second from one attempt. This limitation
has been removed
2011-07-26 17:35 lg4
* modules/Bio/EnsEMBL/Hive/AnalysisJob.pm: allow standalone jobs to
emit warnings
2011-07-26 11:45 lg4
* sql/: tables.sql, tables.sqlite: for the production of rel.64
2011-07-20 19:40 lg4
* sql/: triggers.mysql, triggers.sqlite: bugfix: more foolproof
maths in triggers
2011-07-15 16:42 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm, Worker.pm: moved Worker's
call to Queen->safe_synchronize_AnalysisStats into Worker.pm for
clarity
2011-07-15 15:13 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm: now it
should properly refresh data in the existing object
2011-07-15 14:59 lg4
* modules/Bio/EnsEMBL/Hive/Queen.pm: do not update counters
unnecessarily
2011-07-15 14:23 lg4
* modules/Bio/EnsEMBL/Hive/: AnalysisStats.pm, Queen.pm: formatting
and cleanup
2011-07-15 14:21 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/DBAdaptor.pm,
modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
sql/triggers.mysql, sql/triggers.sqlite: more flexible approach:
allows to add triggers later by simply by sourcing the triggers
file
2011-07-15 11:44 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm,
DBSQL/AnalysisStatsAdaptor.pm: perform worker counting centrally
in the Queen
2011-07-15 11:14 lg4
* modules/Bio/EnsEMBL/Hive/DBSQL/AnalysisStatsAdaptor.pm: do not
update these fields when triggers in place
2011-07-14 15:58 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm: added a
new beekeeper_extra_cmdline_options() interface method for
passing cmdline options to beekeeper.pl/runWorker.pl
2011-07-14 15:26 lg4
* modules/Bio/EnsEMBL/Hive/: Queen.pm, DBSQL/AnalysisJobAdaptor.pm,
DBSQL/AnalysisStatsAdaptor.pm, DBSQL/DBAdaptor.pm,
PipeConfig/HiveGeneric_conf.pm: optional (cmdline-controlled,
off-by-default) support of job-counting triggers
2011-07-14 12:52 lg4
* modules/Bio/EnsEMBL/Hive/PipeConfig/HiveGeneric_conf.pm,
sql/foreign_keys.mysql, sql/foreign_keys.sql,
sql/procedures.mysql, sql/procedures.sql: renamed procedures.sql
and foreign_keys.sql into .mysql equivalents
2011-07-14 12:47 lg4
* sql/: triggers.mysql, triggers.sqlite: experimental: do the job
counting immediately by SQL triggers rather than periodically by
Perl code
2011-07-13 17:24 lg4
* modules/Bio/EnsEMBL/Hive/Params.pm, scripts/standaloneJob.pl:
allow automatic dataflow into tables as well
--------------------------------------------------------------------------------------------------------
13.July, 2011 : Leo
* Worker.pm bugfix : do not perform interval_update if no new jobs were done (to avoid division by zero)
8.July, 2011 : Leo
* Process.pm : do not complain on STDERR that honeycomb_dir is not defined, just return immediately
6.July, 2011 : Leo
* standaloneJob.pl : allow dataflow from it into job table (allow creation of jobs from a DB-less job)
* fixed Hive::RunnableDB::Dummy to work in DB-less mode
* standaloneJob.pl : added automatic dataflow to branch 1
* standaloneJob.pl : added more examples to the POD docs
5.July, 2011 : Leo
* destringify(): fixed a bug that prevented parsing of lists
* allow destringification of values from cmdline parser in Utils.pm
* standaloneJob.pl : allow dataflow from it into tables
* added a compara_db example (with support for compara_dba creation from URLs added to Compara code)
4.July, 2011 : Leo
* JobMessageAdaptor: be numerically strict with the value stored in boolean field
* schema, job_message table: make sure we support PASSED_ON status for jobs that undergo GarbageCollection
2.July, 2011 : Leo
* rationalized dynamic batch_size estimation: it is now done centrally both for single Workers (claiming) and the Queen (sync procedure)
1.July, 2011 : Leo
* bugfix in procedures: show count to be 0 for analyses which have no jobs
30.June, 2011 : Andy
* Old implementation of store_out_files() was generating incompatible SQL for SQLite.
Prepared statement version developed & have removed the IGNORE component due to the delete which occurs as the first task in the method.
29.June, 2011 : Leo
* rationalized reg_conf/reg_alias names and fixed the way they are passed into runWorker and used in it
16.June, 2011 : Kieron
* POD fixes for Doxygen compatibility
15.June, 2011 : Leo
* big change in JobFactory's interface throughout Hive/Compara. We now address fields by name, not by index (well, by default).
* bugfix in DependentOptions : had to increase the number of allowed iterations to perform all necessary substitutions
* added procedures.sqlite (less powerful, but still something)
14.June, 2011 : Leo
* make dir_revhash available for param substitution
7.June, 2011 : Leo
* bugfix in BaseAdaptor : should only $sth->finish if the handle is defined
3.June, 2011 : Leo
* procedures.sql : when selecting from 'progress' view, show all analyses including the empty ones
2.June, 2011 : Leo
* better support for -analysis_topup functionality: check for multiple analysis sections with the same name in current config file
28.May, 2011 : Leo
* intelligently block dataflow from database-less jobs (and prepare for further addition of database rules)
27.May, 2011 : Leo
* dbh() as the focus point of redirectable SQL handlers (similar to compara_dba() in Compara::BaseRunnable)
26.May, 2011 : Leo
* some common subroutines (including help generation) moving out from scripts into Utils.pm
* first introduction of standaloneJobs.pl script for running jobs outside of the Hive (very basic at the moment)
25.May, 2011 : Leo
* cmd_hive.pl fixed to satisfy foreign key constraints
24.May, 2011 : Leo
* branch names functionality stays, but removed from examples as confusing to the users
23.May, 2011 : Leo
* trying to fix the monitor
* DependentOptions: a module only dedicated to parsing of dependent options (about half of HiveGeneric_conf functionality)
* bugfix in Meadow::LSF - correct parsing of bacct output in non-jobarray cases (long-standing)
21.May, 2011 : Leo
* fixed a bug in AnalysisJobAdaptor that prevented the Hive from claiming non-virgin jobs (introduced during sqlite updates)
20.May, 2011 : Leo
* show analysis_id and done_jobs in Utils::Graph
* removed excess diagnostic messages from generate_graph.pl
* removed excess checks from generate_graph.pl - let's assume the user knows what he is doing
* removed unused ProcessWithParams module
19.May, 2011 : Leo
* looking for a better way to set hive_driver, put it into HiveGeneric_conf and LongMult_conf as an example. Still hacky.
* Adaptors: allow autoincrement fields to be specified in the BaseAdaptor
* Adaptors: attempt to store all columns, including the autoincrement
12.May, 2011 : Leo
* removal of tables in procedures.sql now takes into account the foreign keys (so does it in the right order)
* using DBI-provided methods for unification of the mysql/sqlite-related code, suggested by Andy
* small changes related to production of release_63 (schema_version etc).
11.May, 2011 : Leo
* bugfix: better parsing of "ps" output on MacOSX
* (investigative) trying to find a better way to escape complex delimiters in JobFactory, then abandoned it
10.May, 2011 : Leo
* unification of the mysql/sqlite-related code, suggested by Andy
9.May, 2011 : Leo
* added experimental support of sqlite engine throughout the Hive
* removed dependency from DBI of several scripts (it was redundant)
* fixed a typo in AnalysisData adaptor
* made sure docs (as well as the code) refer to ENSEMBL_CVS_ROOT_DIR
5.May, 2011 : Leo
* stopped providing encode_hash() to encourage using stringify() instead (Compara code has been patched already)
3.May, 2011 : Andy
* importing of Config into Utils::Graph
19.Apr, 2011 : Leo
* another rename in Queen module
* removed analysis_id and logic_name from the basic job_message table to minimize dependences.
I expect people to use the more informative 'msg' view anyway.
18.Apr, 2011 : Leo
* new schema drawings in docs
* updated version of adaptors' class hierarchy
14.Apr, 2011 : Leo
* using a new BaseAdaptor as the base class for some simpler adaptors
* RunnableDB::LongMult::SemaStart now shows how to use $self->warning for recording non-fatal messages into job_message tbl.
* Hive::PipeConfig modules will now universally be using ENSEMBL_CVS_ROOT_DIR environment variable
13.Apr, 2011 : Leo
* big schema rename: hive->worker, analysis_job->job, analysis_job_file->job_file, analysis_job_id->job_id,
worker.beekeeper->worker.meadow_type, etc
--------------------------------------------------------------------------------------------------------
30.Mar, 2011 : Leo
* fixed a bug when checking for definedness of branch_name in DataflowRuleAdaptor.pm
29.Mar, 2011 : Leo
* fixed a bug in MySQLTransfer.pm where it never reached automatic dataflow on success
* fixed Process.pm to not import warning() to avoid clashing with ensembl-core's warning
26.Mar-11.Apr, 2011 : Leo
* trying to enforce the 'fetch_by' and 'fetch_all_by' convention for adaptor calls
24.Mar, 2011 : Leo
* some branches can be named instead of numbered (-2, -1, 0, 1)
and 0 now stands for 'ANYFAILURE' (please use with extreme care!)
23.Mar, 2011 : Leo
* sql/procedures.sql gives timing in minutes as well
18.Mar, 2011 : Leo
* one PREPARE statement per dataflow-to-table call (so works for array of output_ids with identical structure)
16.Mar, 2011 : Leo
* Reviewed some procedures.sql adding a procedure for timing and several tricks suggested by Greg.
10-12.Mar, 2011 : Andy
* Introduced a new script for "off-line" creation of pipeline diagrams based on GraphViz.
You will have to refrain from direct creation of jobs/analyses/rules in your pipeline
and do dataflow instead to benefit from the drawing.
9.Mar, 2011 : Leo
* Tried a new debugging trick (dumping the Hive database at key steps in the pipeline)
and documented it in SystemCmd.pm module.
2.Mar, 2011 : Leo
* -keep_alive option added to the Beekeeper to allow it to loop even when all jobs are done.
Requested by Bethan for tracking Blast jobs submitted from the web.
1.Feb, 2011 : Miguel
* fixed init_pipeline.pl that was not propagating an error message from importing a PipeConfig module
27.Jan, 2011 : Leo
* Beekeeper will warn if a pipeline doesn't have a name defined
(to avoid clashes with other unnamed pipelines on the farm).
21.Jan, 2011 : Leo
* Foreign key constraints have been moved into a separate .sql file,
to allow the user to switch them all off.
19-20.Jan, 2011 : Leo
* An analysis can now be empty for blocking purposes (blocks only while it contains undone jobs).
This allows creating more flexible pipelines (that have branches that may never execute, but will not block the rest).
19.Jan, 2011 : Leo
* AnalysisJobAdaptor now retries 3 times to avoid deadlock situations when claiming jobs (may need further attn)
14.Jan, 2011 : Leo
* several textual fields in the schema extended to take longer strings
12.Jan, 2011 : Leo
* MySQLTransfer can now param_substitute()
10-21.Jan, 2011 : Leo
* fixed an issue with $self->o() mechanism in lists (incomplete substitutions).
4-5.Jan, 2011 : Leo
* fixed multiple issues that appeared after introduction of the foreign keys
but only became visible after some testing (thanks to Gautier for extra testing!).
31.Dec, 2010 : Leo Gordon
* Inplemented the long-standing plan to remove the schema/code dependency on UUIDs.
Removed the job_claim field from the schema and the code, changed the way jobs are claimed.
28.Dec, 2010 : Leo Gordon
* Updated schema drawings that contain newly added foreign keys
21-23.Dec, 2010 : Leo Gordon
* Added foreign key constraints, figured out that foreign keys ARE enforced in MySQL 5.1.47,
so had to fix some code (and some ensembl-compara code as well, so keep yours up-to-date).
14.Dec, 2010 : Leo Gordon
* first attempt at creating schema drawings using MySQL Workbench. Drawings added to docs/ .
A lot of foreign key constraints were missing, which influenced the drawing.
If they are not enforced by MySQL anyway, why not add just them?
26.Nov, 2010 : Leo Gordon
* changed JobFactory so that parts of it can be re-used by subclassing. Examples in ensembl-compara.
26.Oct, 2010 : Leo Gordon
* fixed a long-standing bug: input_id was supposed to be able to set things (according to compara code)
19-22.Oct, 2010 : Leo Gordon
* Fixed both rule adaptors and the HiveGeneric_conf to prevent them from creating duplicated rules
when a PipeConfig is re-run.
8.Oct, 2010 : Leo Gordon
A new MySQLTransfer.pm Hive Runnable to copy tables over (an amazingly popular task in our pipelines).
Does an integrity check and fails if underlying mysqldump fails.
1.Oct, 2010 : Leo Gordon
* runWorker.pl only prints the worker once per stream.
So if output is redirected to a file, both the file and the output will contain it.
30.Sept-19.Oct, 2010 : Leo Gordon
* detected a strange behaviour of a Worker that was running a RunnableDB
with 'runaway next' statements. Since it was not possible to fix it (seems to be Perl language issue),
the Worker's code does its best to detect this and exit. Please check that your RunnableDBs do not have runaway nexts.
------------------------------------[previous 'stable' tag]----------------------------------
21-22 Sept, 2010 : Leo Gordon
* a new switch -worker_output_dir allows a particular worker to send its stdout/stderr into the given directory
bypassing the -hive_output_dir if specified.
* streamlining runWorker.pl-Queen.pm communication so that runWorker.pl is now a very lightweight script
(only manages the parameters and output, but no longer runs actual unique functionality)
20 Sept, 2010 : Leo Gordon
* big change: added gc_dataflow (jobs dying because of MEMLIMIT or RUNLIMIT can now be automatically sent
to another analysis with more memory or longer runtime limit. Schema change + multiple code changes.
16 Sept, 2010 : Leo Gordon
* code cleanup and unification of parameter names (older names still supported but not encouraged)
13-14 Sept, 2010 : Leo Gordon
* big change: creating a separate Params class, making it a base class for AnalysisJob,
and removing parameter parsing/reading/setting functionality from the Process. No need in ProcessWithParams now.
This is a big preparation for post-mortem dataflow for resource-overusing jobs.
11 Sept, 2010 : Leo Gordon
* schema change: we are producing release 60!
* bugfix: -alldead did not set 'cause_of_death', now it always sets 'FATALITY' (should we invoke proper GarbageCollection?)
7-9 Sept, 2010 : Leo Gordon
* autoflow() should be a property of a job, not the process. Moved and optimized.
* avoiding filename/pid collisions in Worker::worker_temp_directory, improved reliability.
* removed some Extensions by creating proper hive adaptors (AnalysisAdaptor and MetaContainer)
* changed the way a RunnableDB declares its module defaults. NB!
2-3 Sept, 2010 : Leo Gordon
* optimizing the reliability and the time spent on finding out why LSF killed the jobs
* let MEMLIMIT jobs go into 'FAILED' state from the first attempt (don't waste time retrying)
31 Aug - 1 Sept, 2010 : Leo Gordon
* Added support for finding out WHY a worker is killed by the LSF (MEMLIMIT, RUNLIMIT, KILLED_BY_USER),
the schema is extended to allow this information to be recorded in the 'hive' table.
24 Aug, 2010 : Leo Gordon
* experimental: Queen, Meadow, Meadow::LOCAL and Meadow::LSF changed to make it possible to run several beekeepers
owned by different users over the same database. They _should_not_ collide, but it has not been very thoroughly tested.
23 Aug, 2010 : Leo Gordon
* Worker now reports the reason why it decides to die + good working example (FailureTest framework)
20 Aug, 2010 : Leo Gordon
* Added a generic Stopwatch.pm module to allow for fine timing to be done in a cleaner way
* Added the ability for Runnables to throw messages (which will be recorded in 'job_error' table)
not to be necessarily associated with the job's failure. This change involved schema change as well.
* 'job_error' table is renamed to 'job_message' with the extra field (is_error=0|1) added
13 Aug, 2010 : Javier Herrero
* scripts/cmd_hive.pl: Better support for adding new jobs to an existing analysis. Also, supports adding one single job
13 Aug, 2010 : Leo Gordon
* AnalysisJob and Worker were changed to allow jobs to decide whether it makes any sense to restart them or not.
* a command line switch -retry_throwing_jobs and a corresponding getter/setter method was added to
beekeeper.pl, runWorker.pl and Worker.pm to let the user decide whether to restart failing jobs or not.
11-12 Aug, 2010 : Leo Gordon
* A new table 'job_error' was added to keep track of jobs' termination messages (thrown via 'throw' or 'die'),
this involved schema change and lots of changes in the modules.
* Another big new change is that the Workers no longer die when a Job dies. At least, not by default.
If a Worker managed to catch a dying Job, this fact is registered in the database, but the Worker keeps on taking other jobs.
9-10 Aug, 2010 : Leo Gordon
* RunnableDB::Test renamed into RunnableDB::FailureTest and extended, PipeConfig::FailureTest_conf added to drive this module.
(this was testing ground preparation for job_error introduction)
16 July, 2010 : Leo Gordon
* added -hive_output_dir to beekeeper.pl so that it could be set/overridden from the command line
* dir_revhash is now an importable Util subroutine that is used by both Worker and JobFactory
14 July, 2010 : Leo Gordon
* fixed Meadow::LOCAL so that MacOS's ps would also be supported. eHive now runs locally on Macs :)
13 July, 2010 : Leo Gordon
* added ability to compute complex expressions while doing parameter substitution
12 July, 2010 : Leo Gordon
* added the slides of my HiveTalk_12Jul2010 into docs/
* changed the ambiguous 'output_dir' getter/setter into two methods:
worker_output_dir (if you set it - it will output directly into the given directory) and
hive_output_dir (if you set it, it will perform *reverse_decimal* hashing of the worker_id and create a directory for that particular worker)
2 July, 2010 : Leo Gordon
* [Bugfix] Process::dataflow_output_id() is simplified and generalized
* [Feature/experimental] ProcessWithParams::param_substitute() can now understand #stringifier:param_name# syntax,
several stringifiers added (the syntax is not final, and the stringifiers will probably move out of the module)
* [Feature] TableDumperZipper_conf can now understand negative patterns:
it is an example how to emulate the inexistent in MySQL 5.1 syntax "SHOW TABLES NOT LIKE "%abc%"
my using queries from information_schema.tables
* [Cleanup] the 'did' parameter was finally removed from SystemCmd and SqlCmd to avoid confusion
(the same functionality is already incapsulated into AnalysisJobAdaptor)
* [Feature] SqlCmd can now produce mysql_insert_ids and pass them on as params.
This allows us to grab auto-incremented values on PipeConfig level (see Compara/PipeConfig for examples)
* [Convenience] beekeeper.pl and runWorker.pl can take dbname as a "naked" command line option
(which makes the option's syntax even closer to that of mysql/mysqldump)
* [Convenience] -job_id is now the standard option name understood by beekeeper.pl and runWorker.pl
(older syntax is kept for compatibility)
* [Cleanup] Some unused scripts have been removed from sql/ directory,
drop_hive_tables() added to procedures.sql
* [Bugfix] claim_analysis_status index on analysis_job table has been fixed in tables.sql
and a corresponding patch file added
13 June, 2010 : Leo Gordon
* Added support for dataflow-into-tables, see LongMult example.
10 June, 2010 : Leo Gordon
* A bug preventing users from setting hive_output_dir via pipeline_wide_parameters has been fixed.
3 June, 2010 : Leo Gordon
* one important workaround for LSF command line parsing bug
(the LSF was unable to create job arrays where pipeline name started from certain letters)
* lots of new documentation added and old docs removed (POD documentation in modules as well as eHive initialization/running manuals).
Everyone is encouraged to use the new init_pipeline.pl and PipeConfig-style configuration files.
* a schema change that makes it possible to have multiple input_id_templates per dataflow branch
(this feature is already accessible via API, but not yet implemented in init_pipeline.pl)
* JobFactory now understands multi-column input and intput_id templates can be written to refer to individual columns.
The 'inputquery' mode has been tested and it works.
Both 'inputfile' and 'inputcmd' should be able to split their input on param('delmiter'), but this has not yet been tested.
12 May, 2010 : Leo Gordon
* init_pipeline.pl can be given a PipeConfig file name instead of full module name.
* init_pipeline.pl has its own help that displays pod documentation (same mechanism as other eHive scripts)
* 3 pipeline initialization modes supported:
full (default), -analysis_topup (pipeline development mode) and -job_topup (add more data to work with)
11 May, 2010 : Leo Gordon
* We finally have a universal framework for commandline-configurable pipelines' setup/initialization.
Each pipeline is defined by a Bio::EnsEMBL::Hive::PipeConfig module
that derives from Bio::EnsEMBL::Hive::PipeConfig::HiveGeneric_conf .
Compara pipelines derive from Bio::EnsEMBL::Compara::PipeConfig::ComparaGeneric_conf .
These configuration modules are driven by ensembl-hive/scripts/init_pipeline.pl script.
Having set up what is an 'option' in your config file, you can then supply values for it
from the command line. Option interdependency rules are also supported to a certain extent,
so you can supply *some* options, and rely on the rules to compute the rest.
* Several example PipeConfig files have been written to show how to build pipelines both 'standard blocks'
(SystemCmd, SqlCmd, JobFactory, Dummy, etc) and from RunnableDBs written specifically for the task
(components of LongMult pipeline).
* Both eHive RunnableDB::* and PipeConfig::* modules have been POD-documented.
* A new 'input_id_template' feature has been added to the dataflow mechanism to allow for more flexibility
when integrating external scripts or unsupported software into eHive pipelines.
You can now dataflow from pretty much anything, even if the Runnable did not support dataflow natively.
The corresponding schema patch is in ensembl-hive/sql
* pipeline-wide parameters (kept in 'meta' table) no longer have to be scalar.
Feel free to use arrays or hashes if you need them. init_pipeline.pl also supports multilevel options.
* SqlCmd now has a concept of 'sessions': you can supply several queries in a list that will be executed
one after another. If a query creates a temporary table, all the following ones down the list
will be able to use it.
* SqlCmd can run queries against any database - not necessarily the eHive one. You have to supply a hashref
of connection parameters via $self->param('db_conn') to make it work. It still runs against the eHive
database by default.
* JobFactory now supports 4 sources: inputlist, inputfile, inputquery and inputcmd.
*All* of them now support deep param_substitution. Enjoy.
* NB! JobFactory's substituted parameter syntax changed:
it no longer understands '$RangeStart', '$RangeEnd' and '$RangeCount'.
But it understands '#_range_start#', '#_range_end#' and '#_range_count#' - should be pretty easy to fix.
* several smaller bug fixes and optimizations of the code have also been done.
A couple of utility methods have moved places, but it looks like they were mostly used internally.
Shout if you have lost anything and we'll try to find it together.
26 March, 2010 : Leo Gordon
* branch_code column in analysis_job table is unnecessary and was removed
Branching using branch_codes is a very important and powerful mechanism,
but it is completely defined in dataflow_rule table.
branch_code() WAS at some point a getter/setter method in AnalysisJob,
but it was only used to pass parameters around in the code (now obsolete),
and this information was never reflected in the database,
so analysis_job.branch_code was always 1 no matter what.
* stringification using Data::Dumper with parameters was moved out of init_pipelines and JobFactory.pm
and is now in a separate Hive::Utils.pm module (Hive::Utils::stringify can be imported, inherited or just called).
It is transparently called by AnalysisJobAdaptor when creating jobs which allows
to pass input_ids as hashrefs and not strings. Magic happens on the adaptor level.
* Queen->flow_output_job() method has been made obsolete and removed from the Queen.pm
Dataflow is now completely handled by Process->dataflow_output_id() method,
which now handles arrays/fans of jobs and semaphores (later on this).
Please always use dataflow_output_id() if you need to create a new job or fan of jobs,
as this is the top level method for doing exactly this.
Only call the naked adaptor's method if you know what you're doing.
* JobFactory module has been upgraded (simplified) to work through dataflow mechanism.
It no longer can create analyses, but that's not necessary as it should be init_pipeline's job.
Family pipeline has been patched to work with the new JobFactory module.
* branched dataflow was going to meet semaphores at some point, the time is near.
dataflow_output_id() is now semaphore aware, and can propagate semaphores through the control graph.
A new fan is hooked on its own semaphore; when the semaphored_job is not specified we do semaphore propagation.
Inability to create a job in the fan is tracked and the corresponding semaphore_count decreased
(so users do not have to worry about it).
* LongMult examples have been patched to work with the new dataflow_output_id() method.
* init_pipeline.pl is now more flexible and can understand simplified syntax for dataflow/control rules
22 March, 2010 : Leo Gordon
* Bio::EnsEMBL::Hive::ProcessWithParams is the preferred way of parsing/passing around the parameters.
Module-wide, pipeline-wide, analysis-wide and job-wide parameters and their precedence.
* A new init_pipeline.pl script to create and populate pipelines from a perl hash structure.
Tested with ensembl-hive/docs/long_mult_pipeline.conf and ensembl-compara/scripts/family/family_pipeline.conf . It works.
* Bio::EnsEMBL::Hive::RunnableDB::SystemCmd now supports parameter substitution via #param_name# patterns.
See usage examples in the ensembl-compara/scripts/family/family_pipeline.conf
* There is a new Bio::EnsEMBL::Hive::RunnableDB::SqlCmd that does that it says,
and also supports parameter substitution via #param_name# patterns.
See usage examples in the ensembl-compara/scripts/family/family_pipeline.conf
* Bio::EnsEMBL::Hive::RunnableDB::JobFactory has 3 modes of operation: inputlist, inputfile, inputquery.
See usage examples in the ensembl-compara/scripts/family/family_pipeline.conf
* some rewrite of the Queen/Adaptors code to give us more developmental flexibility
* support for semaphores (job-level control rules) in SQL schema and API
- partially tested, has some quirks, waiting for a more serious test by Albert
* support for resource requirements in SQL schema, API and on init_pipeline config file level
Tested in the ensembl-compara/scripts/family/family_pipeline.conf . It works.
3 December, 2009 : Leo Gordon
beekeeper.pl, runWorker.pl and cmd_hive.pl
got new built-in documentation accessible via perldoc or directly.
2 December, 2009 : Leo Gordon
Bio::EnsEMBL::Hive::RunnableDB::LongMult example toy pipeline has been created
to show how to do various things "adult pipelines" perform
(job creation, data flow, control blocking rules, usage of intermediate tables, etc).
Read Bio::EnsEMBL::Hive::RunnableDB::LongMult for a step-by-step instruction
on how to create and run this pipeline.
30 November, 2009 : Leo Gordon
Bio::EnsEMBL::Hive::RunnableDB::JobFactory module has been added.
It is a generic way of creating batches of jobs with the parameters
given by a file or a range of ids.
Entries in the file can also be randomly shuffled.
13 July, 2009 : Leo Gordon
Merging the "Meadow" code from this March' development branch.
Because it separates LSF-specific code from higher level, it will be easier to update.
-------------------------------------------------------------------------------------------------------
Albert, sorry - in the process of merging into the development branch I had to remove your HIGHMEM code.
I hope it is a temporary measure and we will be having hive-wide queue control soon.
If not - you can restore the pre-merger state by updating with the following command:
cvs update -r lg4_pre_merger_20090713
('maximise_concurrency' option was carried over)
-------------------------------------------------------------------------------------------------------
3 April, 2009 : Albert Vilella
Added a new maximise_concurrency 1/0 option. When set to 1, it will
fetch jobs that need to be run in the adequate order as to maximise
the different number of analyses being run. This is useful for cases
where different analyses hit different tables and the overall sql
load can be kept higher without breaking the server, instead of
having lots of jobs for the same analysis trying to hit the same
tables.
Added quick HIGHMEM option. This option is useful when a small
percent of jobs are too big and fail in normal conditions. The
runnable can check if it's the second time it's trying to run the
job, if it's because it contains big data (e.g. gene_count > 200)
and if it isn't already in HIGHMEM mode. Then, it will call
reset_highmem_job_by_dbID and quit:
if ($self->input_job->retry_count == 1) {
if ($self->{'protein_tree'}->get_tagvalue('gene_count') > 200 && !defined($self->worker->{HIGHMEM})) {
$self->input_job->adaptor->reset_highmem_job_by_dbID($self->input_job->dbID);
$self->DESTROY;
throw("Alignment job too big: send to highmem and quit");
}
}
Assuming there is a
beekeeper.pl -url <blah> -highmem -meadow_options "<lots of mem>"
running, or a
runWorker.pl <blah> -highmem 1
with lots of mem running, it will fetch the HIGHMEM jobs as if they
were "READY but needs HIGHMEM".
Also added a modification to Queen that will not synchronize as
often when more than 450 jobs are running and the load is above
0.9, so that the queries to analysis tables are not hitting the sql
server too much.
23 July, 2008 : Will Spooner
Removed remaining ensembl-pipeline dependencies.
11 March, 2005 : Jessica Severin
Project is reaching a very stable state. New 'node' object Bio::EnsEMBL::Hive::Process
allows for independence from Ensembl Pipeline and provides extended process functionality
to manipulate hive job objects, branch, modify hive graphs, create jobs, and other hive
process specific tasks. Some of this extended 'Process' API may still evolve.
7 June, 2004 : Jessica Severin
This project is under active development and should be classified as pre-alpha
Most of the design has been settled and I'm in the process of implementing the details
but entire objects could disappear or drastically change as I approach the end.
Watch this space for further developments
11 March, 2005 : Jessica Severin