Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
294 commits
Select commit Hold shift + click to select a range
3cea3f3
lib: improve repeat downloader by updating versions from config file,…
jtarraga Mar 8, 2024
f308f25
lib: improve conservation downloader by updating versions from config…
jtarraga Mar 22, 2024
2e6e895
lib: update regulation download manager, and the configuration file, …
jtarraga Mar 28, 2024
a6688d0
lib: update configuration file; and create version files for COSMIC a…
jtarraga Apr 3, 2024
c2345d4
lib: update CellBase builder for clinical variants, #TASK-5776, #TASK…
jtarraga Apr 3, 2024
df0f1e0
lib: fix Gwas Catalog builder for clinical variants, #TASK-5776, #TAS…
jtarraga Apr 4, 2024
d4cba15
lib: refactor code by changing the DownloadProperties.URLProperties, …
jtarraga Apr 5, 2024
a3e9684
lib: update CellBase downloaders according to the DownloadProperties.…
jtarraga Apr 11, 2024
c7ad55d
Rename get file name method
imedina Apr 11, 2024
e92b676
lib: update CellBase downloaders according to the DownloadProperties.…
jtarraga Apr 11, 2024
281fb22
Resolve conflicts, #TASK-5564
jtarraga Apr 11, 2024
e18506b
lib: update CellBase downloaders, #TASK-5775, #TASK-5564
jtarraga Apr 12, 2024
69a58bf
core: update CellBase configuration file, #TASK-5775, #TASK-5564
jtarraga Apr 15, 2024
d4e0cd6
lib: update MANE Select downloader, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
6ee2f78
lib: update LRG, HGNC, Cancer HotSpot, DGIDB, Gene Uniprot Xref, Gene…
jtarraga Apr 18, 2024
d794ceb
lib: update RefSeq downloader, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
1b751de
lib: update missense scores (REVEL) downloader, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
b635333
lib: update CADD and clinical variant downloaders, #TASK-5775, #TASK-…
jtarraga Apr 18, 2024
106b96d
lib: update protein downloaders, #TASK-5775, #TASK-5564
jtarraga Apr 18, 2024
55afe6b
lib: update gene downloader (specially for ensembl data), and improve…
jtarraga Apr 19, 2024
d81b68f
Merge branch 'TASK-5564' into TASK-5387
jtarraga Apr 19, 2024
88c2b17
core: add Ensembl primary fasta URL into the configuration file for t…
jtarraga Apr 22, 2024
eee13e3
lib: update genome download manager by declaring and using constants …
jtarraga Apr 22, 2024
cd367b9
app: update genome builder by using constants from the class EtlCommo…
jtarraga Apr 22, 2024
ce6f8d5
app: fix sonnar issues in BuildCommandExecutor, #TASK-5564
jtarraga Apr 22, 2024
3566e01
app: improve log/exception messages in DownloadCommandExecutor, #TASK…
jtarraga Apr 22, 2024
cd94452
app: update repeats builder, and improve log/exception messages, #TAS…
jtarraga Apr 22, 2024
148814f
lib: update the repeats builder by removing the hardcoded filenames a…
jtarraga Apr 22, 2024
30a4c87
lib: update conservation builder by removing the hardcoded filenames …
jtarraga Apr 22, 2024
85e17db
lib: call bigWigToBedGraph to convert the GERP bigwig to bed graph fi…
jtarraga Apr 23, 2024
0223cb5
lib: include log messages, #TASK-5564
jtarraga Apr 23, 2024
833c337
lib: improve ProteinBuilder by removing hardcoded file names, adding …
jtarraga Apr 23, 2024
01deb0c
lib: move DataSource reader from ConservationBuilder to the parent Ce…
jtarraga Apr 24, 2024
9416894
lib: move the function to split UniProt into chuncks from the protein…
jtarraga Apr 24, 2024
909c0b2
core: fix regulation URLs in the configuration file, #TASK-5775, #TAS…
jtarraga Apr 24, 2024
71d8056
lib: launch a CellBase exception if executing a command (wget, gunzip…
jtarraga Apr 24, 2024
1544824
lib: fix sonnar issues, #TASK-5775, #TASK-5564
jtarraga Apr 24, 2024
3e43874
lib: move the function to parse and build PFMs from the regulation do…
jtarraga Apr 24, 2024
959e423
core: update ontology section of the CellBase configuration since ont…
jtarraga Apr 25, 2024
158c259
lib: update ontology download since ontology versions will be taken f…
jtarraga Apr 25, 2024
0b83831
app: update the build command executor to check/copy the ontology ver…
jtarraga Apr 25, 2024
39f0f41
lib: improve the ontology builder by removing hardcoded filenames, ad…
jtarraga Apr 25, 2024
5c3dae0
lib: improve the PharmGKB downloader by moving the function to unzip …
jtarraga Apr 25, 2024
971235e
lib: improve the PharmGKB builder by adding checks and log messages; …
jtarraga Apr 25, 2024
cd444b0
lib: improve the PubMed downloader by adding log messages and fixing …
jtarraga Apr 25, 2024
e19fe73
lib: create maps to get the names, categories and version filenames f…
jtarraga Apr 26, 2024
a29afe3
lib: update according to the EtlCommons changes, #TASK-5775, #TASK-5564
jtarraga Apr 26, 2024
377ee9c
lib: improve PubMed builder by adding checks, log messages and fixing…
jtarraga Apr 26, 2024
997c8ec
lib: update CADD downloader according to last changes, #TASK-5775, #T…
jtarraga Apr 26, 2024
96078b7
lib: improve the CADD builder by adding checks, log messages, cleanin…
jtarraga Apr 26, 2024
3163a90
lib: update the REVEL downloader according to the last changes, and a…
jtarraga Apr 26, 2024
bc22fad
lib: add log messages, #TASK-5776, #TASK-5564
jtarraga Apr 29, 2024
0c9a299
lib: improve the Revel builder by fixing sonnar issues and adding che…
jtarraga Apr 29, 2024
4f9e39a
lib: update CellBase downloaders according to the last changes, #TASK…
jtarraga Apr 29, 2024
1586a77
app: update load command executor according to the EtlCommons changes…
jtarraga Apr 29, 2024
c7c398a
lib: update CellBase builders according to the EtlCommons changes, #T…
jtarraga Apr 29, 2024
754384a
lib: fix revel builder, #TASK-5776, #TASK-5564
jtarraga Apr 29, 2024
24eb091
configuration: update versions
imedina May 7, 2024
fc09da4
app: add bash script to fix the downloaded MirTarBase file, #TASK-577…
jtarraga May 7, 2024
09d33a0
core: add some comments to the configuration file, #TASK-5775, #TASK-…
jtarraga May 7, 2024
303585d
lib: update Ensembl/RefSeq indexers and builders (include major impro…
jtarraga May 7, 2024
68c47ef
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga May 7, 2024
312c654
Merge branch 'TASK-5564' into TASK-5387
jtarraga May 7, 2024
5665648
Merge branch 'TASK-5387' into TASK-5388
jtarraga May 7, 2024
a25b9c1
core: fix PGS section in the configuration file, #TASK-5406, #TASK-5387
jtarraga May 8, 2024
df05c91
app: add PGS_DATA (polygenic scores) as valid data in the CellBase bu…
jtarraga May 8, 2024
e7c2385
lib: update clinical variant downloader by moving the split ClinVar f…
jtarraga May 10, 2024
f5b7c34
lib: update clinical variant builder by including the split ClinVar f…
jtarraga May 10, 2024
a4fca6b
lib: update code to the last changes, #TASK-5564
jtarraga May 10, 2024
8598f08
Merge branch 'TASK-5564' into TASK-5387
jtarraga May 10, 2024
dec89f8
Merge branch 'TASK-5387' into TASK-5388
jtarraga May 10, 2024
57c6f6f
lib: include SpliceAI/MMSplice in the configuration file, and create …
jtarraga May 11, 2024
c131459
lib: remove deprecated functions, #TASK-5575, #TASK-5564
jtarraga May 11, 2024
3ef70b1
lib: improve PGS Catalog downloader, #TASK-5406, #TASK-5387
jtarraga May 15, 2024
31bf3a2
lib: improve PGS Catalog builder, #TASK-5407, #TASK-5387
jtarraga May 15, 2024
c8d416e
lib: update CellBase loader for PGS Catalog data, #TASK-5410, #TASK-5387
jtarraga May 15, 2024
a8a047c
lib: improve gene downloader by taking into account the manually down…
jtarraga May 16, 2024
100d6f3
lib: update gene builder (Ensembl/RefSeq) according to last changes, …
jtarraga May 17, 2024
1852f3e
Merge branch 'TASK-5564' into TASK-5387
jtarraga May 17, 2024
910ffd3
lib: refactor PGS builder to solve the RocksDB issue, #TASK-5407, #TA…
jtarraga May 18, 2024
3939ac3
lib: improve PGS builder by speeding-up RocksDB, #TASK-5407, #TASK-5387
jtarraga May 20, 2024
0cd4b80
lib: udate Ensembl/RefSeq gene builder to gunzip FASTA files before b…
jtarraga May 27, 2024
e42cd7e
Merge branch 'develop' into TASK-5564
jtarraga May 27, 2024
6eac380
Merge branch 'TASK-5564' into TASK-5387
jtarraga May 27, 2024
4d965d7
Merge branch 'develop' into TASK-5564
imedina Jun 22, 2024
fc65d14
lib: add hpo filter to GeneQuery
imedina Jun 23, 2024
84ad97b
Many improvements and fixes:
imedina Jul 2, 2024
aaec065
* Add new ensembl_canonical.pl
imedina Jul 2, 2024
dad180d
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga Jul 2, 2024
694b81d
lib: use DockerUtils to execute Perl script from docker image, #TASK-…
jtarraga Jul 2, 2024
c8e719a
test: update JUnit tests, #TASK-5564
jtarraga Jul 3, 2024
19efdf4
cicd: update task.yml to deploy cellbase-builder docker, #TASK-5564
jtarraga Jul 3, 2024
fcbb680
build: create the MiRTarBase parser for .xlsx files, #TASK-5576, #TAS…
jtarraga Jul 4, 2024
10a579a
Builder improvements and several data cleaning
imedina Jul 4, 2024
87d95e8
Merge branch 'TASK-5564' of github.com:opencb/cellbase into TASK-5564
imedina Jul 4, 2024
c6bcbdd
Gene downloader fixes
imedina Jul 4, 2024
0a6a84f
Add VariationDownloader
imedina Jul 5, 2024
3dcad47
Add VariationDownloader
imedina Jul 5, 2024
5eb33ae
app: update Dockerfile for cellbase-builder in order to allow the scr…
jtarraga Jul 8, 2024
f44dcc5
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga Jul 8, 2024
2b226fe
lib: add variation to the EtlCommons dataVersionFilenamesMap, #TASK-5…
jtarraga Jul 9, 2024
510819c
Merge branch 'develop' into TASK-5564
jtarraga Jul 22, 2024
5514177
lib: remove unused variables, #TASK-5575, #TASK-5564
jtarraga Jul 23, 2024
ae9a817
core: add the field 'id' in DataSource model, #TASK-5575, #TASK-5564
jtarraga Jul 23, 2024
20c554b
core: update DGIdb in the configuration file, #TASK-5575, #TASK-5564
jtarraga Jul 23, 2024
9d2d4fe
lib: check if genome data is already downloaded before downloading to…
jtarraga Jul 23, 2024
299003b
lib: add the parameter 'assembly' to command line when calling the sc…
jtarraga Jul 23, 2024
1d171d5
lib: update GeneDownloadManager to call the script gene_extra_info.pl…
jtarraga Jul 24, 2024
d10931d
lib: improve genome and conservation downloaders by checking if data …
jtarraga Jul 24, 2024
b422f3a
lib: improve repeats downloaders by checking if data is already downl…
jtarraga Jul 24, 2024
d0c0ba3
lib: improve regulation downloader by checking if data is already dow…
jtarraga Jul 24, 2024
1dc504f
lib: fix motif features folder for regulation downloader, #TASK-5575,…
jtarraga Jul 24, 2024
4ba788d
lib: fix minor sonnar issue, #TASK-5575, #TASK-5564
jtarraga Jul 24, 2024
6fc7129
lib: improve protein downloader by checking if data is already downlo…
jtarraga Jul 24, 2024
8ed0e0d
lib: improve variation downloader by checking if data is already down…
jtarraga Jul 24, 2024
1442766
lib: fix variation folder in downloader, #TASK-5575, #TASK-5564
jtarraga Jul 24, 2024
e48d27d
core: remove DISGENET, #TASK-5575, #TASK-5564
jtarraga Jul 24, 2024
642935a
lib: improve gene downloader, removing DISGENET, fixing sonnar issues…
jtarraga Jul 24, 2024
8030b02
lib: fix command line to execute Perl script, #TASK-5575, #TASK-5564
jtarraga Jul 24, 2024
e17e51d
lib: add files generated by scripts in the version JSON files, #TASK-…
jtarraga Jul 25, 2024
733cade
lib: improve genome builder by checking files, and fixing sonnar issu…
jtarraga Jul 25, 2024
ddc1056
lib: take into account the parameter --keep when gunzip, #TASK-5576, …
jtarraga Jul 25, 2024
8c6dc78
lib: improve conservation builder by adding checks, log messages and …
jtarraga Jul 26, 2024
847f835
lib: add support for multi-species, checks and log messages in the re…
jtarraga Jul 26, 2024
b0d1c67
lib: add support for multi-species, checks and log messages in regula…
jtarraga Jul 26, 2024
039aa81
lib: fix protein builder, #TASK-5576, #TASK-5564
jtarraga Jul 29, 2024
7f77dec
lib: fix gene downloader for RefSeq files, #TASK-5575, #TASK-5564
jtarraga Jul 29, 2024
0eb898e
lib: improve gene (Ensembl/RefSeq) builder by supporting multi-specie…
jtarraga Jul 31, 2024
1d47fd9
lib: fix sonnar issues, #TASK-5576, #TASK-5564
jtarraga Jul 31, 2024
7fbc054
lib: add variant and variant_structural_variations in the configurati…
jtarraga Jul 31, 2024
d483dcf
app: improve CellBase loader by creating a new function to be reused …
jtarraga Aug 1, 2024
7f62ce7
lib: improve genome sequence and info loader, #TASK-6142, #TASK-5564
jtarraga Aug 1, 2024
0602bba
app: update CellBase loader for conservation data, #TASK-6142, #TASK-…
jtarraga Aug 1, 2024
2b4fbeb
app: update CellBase loader for genes and proteins according to the p…
jtarraga Aug 1, 2024
d693f57
lib: add VariantBuilder to generate the variation JSON files from VCF…
jtarraga Aug 1, 2024
38400c1
app: update the CellBase loader for variation data according to the l…
jtarraga Aug 1, 2024
3117337
app: add check before building variation data, #TASK-5776, #TASK-5564
jtarraga Aug 2, 2024
9c810e7
lib: skip API-KEY param when parsing variant quey, #TASK-5564
jtarraga Aug 2, 2024
ec5f21a
server: update RESTful server to take into account multi-species, #TA…
jtarraga Aug 2, 2024
36c3609
lib: extract the FutureSpliceScoreAnnotator in a file to reduce the V…
jtarraga Aug 2, 2024
efa4824
lib: update the VariantAnnotationCalculator to support multi-species,…
jtarraga Aug 2, 2024
4326fa3
lib: add log messages in protein builder, #TASK-5776, #TASK-5564
jtarraga Aug 2, 2024
2c7ddfb
lib: set variant ID in VariantBuilder, #TASK-5576, #TASK-5564
jtarraga Aug 5, 2024
78211d0
lib: remove System.exit, #TASK-5576, #TASK-5564
jtarraga Aug 5, 2024
e0c6a13
lib: fix VariationBuilder by converting SV values from Ensembl to sta…
jtarraga Aug 5, 2024
81e4cb1
lib: add new command 'data-list' to display the list of data supporte…
jtarraga Aug 6, 2024
280fd67
app: update build options and fix sonnar issues, #TASK-5576, #TASK-5564
jtarraga Aug 6, 2024
2235e5c
app: update CLI option descriptions for loading, exporting, indexing.…
jtarraga Aug 6, 2024
08b0e1d
Prepare Port Patch Cellbase 5.8.3 -> 6.3.0 #TASK-6647
juanfeSanahuja Aug 6, 2024
dee7972
Merge branch 'develop' into TASK-6647-dev
juanfeSanahuja Aug 6, 2024
6a4c16a
test: update JUnit tests according to the latest changes, #TASK-5564
jtarraga Aug 7, 2024
68c9f43
lib: improve variation builder by setting xref and annotation, and re…
jtarraga Aug 7, 2024
914b9c1
lib: remove break for testing, #TASK-5576, #TASK-5564
jtarraga Aug 7, 2024
3538e14
core: add ontology data into configuration file for "mus musculus" an…
jtarraga Aug 7, 2024
d0d92a3
lib: update ontology downloader and take into account multi-species s…
jtarraga Aug 8, 2024
d51114b
lib: update ontology builder and take into account multi-species supp…
jtarraga Aug 8, 2024
24450d3
app: update load command executor for ontology data according to the …
jtarraga Aug 8, 2024
132382d
app: check data according to the species before loading data, #TASK-6…
jtarraga Aug 8, 2024
d556c4c
app: fix sonnar issues, #TASK-6142, #TASK-5564
jtarraga Aug 8, 2024
60860ed
Merge pull request #706 from opencb/TASK-6647-dev
juanfeSanahuja Aug 8, 2024
1f3572c
lib: fix the function to save status and message of the downloaded fi…
jtarraga Aug 9, 2024
6056655
Merge branch 'develop' into TASK-5564
jtarraga Aug 10, 2024
a8d6368
core: add dbSNP in config file (removed after merging), #TASK-5564
jtarraga Aug 10, 2024
2950c0e
Merge branch 'TASK-5564' into TASK-5387
jtarraga Aug 13, 2024
162f34d
add: improve species and assembly parameter descriptions, #TASK-5575,…
jtarraga Aug 13, 2024
344e92e
test: fix JUnit tests by updating configuration files, #TASK-5564
jtarraga Aug 13, 2024
22770de
server: fix meta/health for multiple species, #TASK-6426, #TASK-5564
jtarraga Aug 14, 2024
4cffc55
server: improve MetaWSServer for multiple species support, #TASK-6426…
jtarraga Sep 3, 2024
2889192
lib: limit WriteBatch for number of items, #TASK-5407, #TASK-5387
jtarraga Sep 12, 2024
8c8a4ea
lib: improve PGS builder, #TASK-5407, #TASK-5387
jtarraga Oct 1, 2024
28a57ba
lib: fix sonnar issues, #TASK-5407, #TASK-5387
jtarraga Oct 1, 2024
a49783e
config: update version
imedina Feb 3, 2025
76d849a
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga Feb 18, 2025
2a0eb28
lib: update MiRTarBase indexer to take into account the new format (C…
jtarraga Feb 20, 2025
f5edb1f
lib: fix checkstyle and sonnar issues, #TASK-5564
jtarraga Feb 20, 2025
e65068d
lib: add _chunkIds in the collection conservation when loading data, …
jtarraga Mar 5, 2025
f45c2fc
config: update all data sources
imedina May 26, 2025
32a1323
Merge branch 'release-6.x.x' into TASK-5564
jtarraga May 28, 2025
1952fd4
core: fix UniProt version to 2025-02, #TASK-5564
jtarraga Jun 2, 2025
3146041
lib: update UniProt builder to support release 2025-02, #TASK-5576, #…
jtarraga Jun 2, 2025
13258bc
core: update the section PubMed of the configuration file, #TASK-5575…
jtarraga Jun 2, 2025
a7fd402
core: update configuration file for regulatory and motif features, #T…
jtarraga Jun 4, 2025
247a135
lib: update gene builder for gnomAD 4.1 for Ensembl, and include it f…
jtarraga Jun 4, 2025
7e53142
lib: update regulatory feature builder, #TASK-5576, #TASK-5564
jtarraga Jun 4, 2025
e3f80b5
lib: add more gnomAD 4.1 constraint scores, #TASK-5576, #TASK-5564
jtarraga Jun 5, 2025
cf425d1
lib: add imprented gene data from geneimprint.com, #TASK-7745, #TASK-…
jtarraga Jun 10, 2025
a824d7a
app: fix gene build path, #TASK-5576, #TASK-5564
jtarraga Jun 10, 2025
eec00af
lib: set category gene_annotation for geneimprint, and add to common …
jtarraga Jun 10, 2025
694fd38
core: fix geneimprint URL in configuration file, #TASK-7745, #TASK-5564
jtarraga Jun 10, 2025
504cb82
core: update configuration file, #TASK-5575, #TASK-5564
jtarraga Jun 25, 2025
5cd942b
lib: fix HPO file parser (gene builder indexer), #TASK-5576, #TASK-5564
jtarraga Jun 25, 2025
7e1ecb6
lib: update COSMIC builder to support v101 and above, #TASK-5576, #TA…
jtarraga Jun 25, 2025
b7f47a1
lib: fix gnomAD constraint indexer, #TASK-5576, #TASK-5564
jtarraga Jun 25, 2025
8932c7f
lib: update COSMIC indexer, #TASK-5576, #TASK-5564
jtarraga Jun 25, 2025
1c5a5da
lib: improve geneimprint indexer, #TASK-5576, #TASK-5564
jtarraga Jun 25, 2025
a5590e9
Merge branch 'release-6.x.x' into TASK-5564
jtarraga Jul 7, 2025
85c1dc6
Merge branch 'TASK-5564' into TASK-5387
jtarraga Jul 7, 2025
f981f22
Merge branch 'TASK-5387' into TASK-5388
jtarraga Jul 8, 2025
18dddab
pom: upgrade biodata and java-commons-lib dependencies, #TASK-5564
jtarraga Jul 8, 2025
e8a4b55
lib: add ChimerDB data (gene fusion) to gene annotation, #TASK-7830, …
jtarraga Jul 27, 2025
7757e70
lib: update according to biodata changes, #TASK-7830, #TASK-5564
jtarraga Jul 28, 2025
12b699a
lib: update according to biodata changes, #TASK-7745, #TASK-7830, #TA…
jtarraga Jul 28, 2025
1165ee7
lib: update variant annotation calculator to take into account imprin…
jtarraga Jul 29, 2025
129051a
pom: add the profile default-config-test-local, #TASK-5564
jtarraga Aug 1, 2025
71e1352
lib: download ChimerKB, ChimerPub and ChimerSeq from ChimerDB, #TASK-…
jtarraga Aug 5, 2025
f9c6e7e
lib: download and build ChimerPub and ChimerSeq data from ChimerDB fo…
jtarraga Aug 5, 2025
b147775
lib: update according to biodata changes, #TASK-7745, #TASK-7830, #TA…
jtarraga Aug 11, 2025
0d82f3b
lib: update according to biodata changes, and add mongodb-indexes, #T…
jtarraga Aug 11, 2025
4858472
lib: download CIViC data, #TASK-7903, #TASK-5564
jtarraga Sep 8, 2025
1b26b90
lib: implement CivicIndexer, #TASK-7903, #TASK-5564
jtarraga Sep 9, 2025
5e6d98c
lib: improve CIViC indexer, #TASK-7903, #TASK-5564
jtarraga Sep 12, 2025
3dd6361
lib: add CIViC additional properties in evidence entries, #TASK-7903,…
jtarraga Sep 12, 2025
ba0f293
server: improve the API key check before executing the CellBase endpo…
jtarraga Sep 16, 2025
926b3d2
lib: update PharmGKB to ClinPGx, #TASK-7911, #TASK-5564
jtarraga Sep 16, 2025
b951b3c
lib: add rating and evidence_level in additional properties, #TASK-79…
jtarraga Sep 17, 2025
e78048d
config: fix some version dates
imedina Sep 17, 2025
7536db4
core: update the configuration file, #TASK-5564
jtarraga Sep 18, 2025
a5961bb
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga Sep 18, 2025
538212c
lib: fix clinical variant builder, #TASK-5564
jtarraga Sep 18, 2025
7a17ed5
lib: fix GWAS download path, #TASK-5564
jtarraga Sep 18, 2025
a8b6de7
lib: improve clinical variants download/build paths, #TASK-5564
jtarraga Sep 18, 2025
1e88be5
lib: improve CIViC indexer, #TASK-7903, #TASK-5564
jtarraga Sep 25, 2025
0b33170
app: fix CADD loading, #TASK-6142, #TASK-5564
jtarraga Nov 4, 2025
26e08f0
app: fix ClinPGx loader, #TASK-7911, #TASK-5564
jtarraga Nov 4, 2025
0a6c2d8
app: fix CIViC loader (clinical variants), #TASK-7903, #TASK-5564
jtarraga Nov 5, 2025
2ccfaa0
lib: add indexes for PGS collections, #TASK-5410, TASK-5564
jtarraga Nov 7, 2025
edf4515
lib: reduce batch size for PubMed data when loading, #TASK-6142, #TAS…
jtarraga Nov 10, 2025
7d1f297
lib: fix sonnar issues, #TASK-6142, #TASK-5564
jtarraga Nov 10, 2025
46de716
Merge branch 'release-6.x.x' into TASK-5564
jtarraga Nov 12, 2025
604e42e
core: fix configuration file for JUnit test, #TASK-5564
jtarraga Nov 12, 2025
28afbdd
lib: fix loader (PubMed), #TASK-6142, #TASK-5564
jtarraga Nov 18, 2025
abf94a0
server: improve default data releases for multiple species, #TASK-5564
jtarraga Nov 20, 2025
5addb64
server: improve messages in endpoint /meta/about, #TASK-5564
jtarraga Nov 20, 2025
c135e05
cicd: added to test-analysis -DCELLBASE.WAR.NAME=cellbase #TASK-5564
juanfeSanahuja Nov 20, 2025
b6e7363
cicd: added to test-analysis -DCELLBASE.WAR.NAME=cellbase #TASK-5564
juanfeSanahuja Nov 20, 2025
6f907ff
lib: use estimatedCount to speedup count queries, #TASK-5564
jtarraga Nov 21, 2025
1a60716
Merge branch 'TASK-5564' of https://github.com/opencb/cellbase into T…
jtarraga Nov 21, 2025
9b97afc
core: use the env. variable CELLBASE_SECRET_KEY, #TASK-8046, #TASK-5564
jtarraga Nov 21, 2025
0750f55
lib: add more indexes for the collection 'pubmed', #TASK-5564
jtarraga Nov 21, 2025
f9129c4
server: add some admin/endpoints to API key management, TASK-7912, #T…
jtarraga Nov 24, 2025
167dcf1
lib: add DbSnpDownloader, and remove building and loading dbSNP data …
jtarraga Dec 9, 2025
31286ce
lib: backwards compatibility, #TASK-5564
jtarraga Dec 12, 2025
0a2d312
lib: fix typo, #TASK-5564
jtarraga Dec 16, 2025
210a4d7
lib: catch exceptions in the different consequence type calculators, …
jtarraga Dec 16, 2025
3873120
lib: fix PGS include, and remove some System.out, #TASK-5564
jtarraga Dec 16, 2025
9fe6dec
app: implement a Python script to compare performances, #TASK-5564
jtarraga Dec 17, 2025
45b2c44
app: add Python script to get metrics for a given variant annotation …
jtarraga Dec 17, 2025
8f2da9c
lib: disable polygenic scores and mirna targets, to be enables in fut…
jtarraga Dec 17, 2025
6df8576
lib: fix NPE for conservation scores in breakends, #TASK-5564
jtarraga Jan 9, 2026
93e86fd
Merge branch 'release-6.x.x' into TASK-5564
jtarraga Jan 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/task.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,5 @@ jobs:
uses: opencb/java-common-libs/.github/workflows/deploy-docker-hub-workflow.yml@develop
needs: test
with:
cli: python3 ./build/cloud/docker/docker-build.py push --images base --tag ${{ github.ref_name }}
cli: python3 ./build/cloud/docker/docker-build.py push --images base,builder --tag ${{ github.ref_name }}
secrets: inherit
2 changes: 1 addition & 1 deletion .github/workflows/test-analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ jobs:
uses: opencb/java-common-libs/.github/workflows/build-java-app-workflow.yml@develop
with:
needs_hadoop_preparation: false
maven_opts: -Dcheckstyle.skip
maven_opts: -Dcheckstyle.skip -DCELLBASE.WAR.NAME=cellbase
upload_artifact: ${{ inputs.upload_artifact }}
dependency_repos: "java-common-libs,biodata"
secrets: inherit
Expand Down
10 changes: 7 additions & 3 deletions cellbase-app/app/cloud/docker/cellbase-builder/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ LABEL org.label-schema.vendor="OpenCB" \
## We need to be root to install dependencies
USER root
RUN apt-get update -y && \
apt-get install -y git default-mysql-client libjson-perl libdbi-perl libdbd-mysql-perl libdbd-mysql-perl libtry-tiny-perl && \
apt-get install -y git default-mysql-client libjson-perl libdbi-perl libdbd-mysql-perl libdbd-mysql-perl libtry-tiny-perl libxml-simple-perl liblog-log4perl-perl libxml-parser-perl libxml-dom-perl && \
mkdir /opt/ensembl && chown cellbase:cellbase /opt/ensembl && \
rm -rf /var/lib/apt/lists/*

Expand All @@ -26,6 +26,10 @@ RUN cd /opt/ensembl && \
git clone https://github.com/Ensembl/ensembl-variation.git && \
git clone https://github.com/Ensembl/ensembl-funcgen.git && \
git clone https://github.com/Ensembl/ensembl-compara.git && \
git clone https://github.com/Ensembl/ensembl-io.git
git clone https://github.com/Ensembl/ensembl-io.git && \
git clone --branch cvs/release-0_7 https://github.com/biomart/biomart-perl

ENV PERL5LIB=$PERL5LIB:/opt/ensembl/bioperl-live:/opt/ensembl/ensembl/modules:/opt/ensembl/ensembl-variation/modules:/opt/ensembl/ensembl-funcgen/modules:/opt/ensembl/ensembl-compara/modules:/opt/ensembl/lib/perl/5.18.2:/opt/cellbase
## Give writting permissions to allow the script ensembl_canonical.pl to create sub-folder for cache purposes
RUN chmod -R 777 /opt/cellbase/scripts/ensembl-scripts/

ENV PERL5LIB=$PERL5LIB:/opt/ensembl/bioperl-live:/opt/ensembl/ensembl/modules:/opt/ensembl/ensembl-variation/modules:/opt/ensembl/ensembl-funcgen/modules:/opt/ensembl/ensembl-compara/modules:/opt/ensembl/lib/perl/5.18.2:/opt/cellbase/scripts/ensembl-scripts:/opt/ensembl/biomart-perl/lib
17 changes: 7 additions & 10 deletions cellbase-app/app/scripts/ensembl-scripts/DB_CONFIG.pm
Original file line number Diff line number Diff line change
Expand Up @@ -134,16 +134,13 @@ our $ENSEMBL_GENOMES_PORT = "4157";
our $ENSEMBL_GENOMES_USER = "anonymous";

## Vertebrates
our $HOMO_SAPIENS_CORE = "homo_sapiens_core_104_38";
our $HOMO_SAPIENS_VARIATION = "homo_sapiens_variation_104_38";
our $HOMO_SAPIENS_FUNCTIONAL = "homo_sapiens_funcgen_104_38";
our $HOMO_SAPIENS_COMPARA = "homo_sapiens_compara_104_38";
#our $HOMO_SAPIENS_CORE = "homo_sapiens_core_78_38";
#our $HOMO_SAPIENS_VARIATION = "homo_sapiens_variation_78_38";
#our $HOMO_SAPIENS_FUNCTIONAL = "homo_sapiens_funcgen_78_38";
our $MUS_MUSCULUS_CORE = "mus_musculus_core_78_38";
our $MUS_MUSCULUS_VARIATION = "mus_musculus_variation_78_38";
our $MUS_MUSCULUS_FUNCTIONAL = "mus_musculus_funcgen_78_38";
our $HOMO_SAPIENS_CORE = "homo_sapiens_core_114_38";
our $HOMO_SAPIENS_VARIATION = "homo_sapiens_variation_114_38";
our $HOMO_SAPIENS_FUNCTIONAL = "homo_sapiens_funcgen_114_38";
our $HOMO_SAPIENS_COMPARA = "homo_sapiens_compara_114_38";
our $MUS_MUSCULUS_CORE = "mus_musculus_core_114_39";
our $MUS_MUSCULUS_VARIATION = "mus_musculus_variation_114_39";
our $MUS_MUSCULUS_FUNCTIONAL = "mus_musculus_funcgen_114_39";
our $RATTUS_NORVEGICUS_CORE = "rattus_norvegicus_core_78_5";
our $RATTUS_NORVEGICUS_VARIATION = "rattus_norvegicus_variation_78_5";
our $RATTUS_NORVEGICUS_FUNCTIONAL = "rattus_norvegicus_funcgen_78_5";
Expand Down
61 changes: 61 additions & 0 deletions cellbase-app/app/scripts/ensembl-scripts/ensembl_canonical.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#!/usr/bin/env perl

use strict;
use Getopt::Long;
use Data::Dumper;
use JSON;
use DB_CONFIG;

use BioMart::Initializer;
use BioMart::Query;
use BioMart::QueryRunner;

## Default values
my $species = 'hsapiens';
my $outdir = "./";

## Parsing command line
GetOptions ('species=s' => \$species, 'outdir=s' => \$outdir);


my $confFile = "/opt/cellbase/scripts/ensembl-scripts/martURLLocation.xml";

# NB: change action to 'clean' if you wish to start a fresh configuration
# and to 'cached' if you want to skip configuration step on subsequent runs from the same registry
my $action='clean';
my $initializer = BioMart::Initializer->new('registryFile'=>$confFile, 'action'=>$action);
my $registry = $initializer->getRegistry;

my $query = BioMart::Query->new('registry'=>$registry,'virtualSchemaName'=>'default');

$query->setDataset($species."_gene_ensembl");

$query->addAttribute("ensembl_gene_id");
$query->addAttribute("ensembl_transcript_id");
$query->addAttribute("transcript_is_canonical");

$query->formatter("TSV");

# Open the file for writing
open(my $fh, '>', "$outdir/ensembl_canonical.txt") or die "Cannot open ensembl_canonical.txt file: $!";

# Save the original stdout
my $original_stdout = *STDOUT;
open(STDOUT, '>&', $fh) or die "Can't redirect STDOUT: $!";

my $query_runner = BioMart::QueryRunner->new();

# to obtain unique rows only
$query_runner->uniqueRowsOnly(1);
$query_runner->execute($query);
#$query_runner->printHeader();
#print ENSEMBL_CANONICAL $query_runner->printResults();
# Call printResults which prints to STDOUT (now redirected to the file)
$query_runner->printResults();
#$query_runner->printFooter();

# Restore the original stdout
open(STDOUT, '>&', $original_stdout) or die "Can't restore STDOUT: $!";

# Close the filehandle
close($fh) or die "Failed to close file: $!";
8 changes: 5 additions & 3 deletions cellbase-app/app/scripts/ensembl-scripts/gene_extra_info.pl
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@
####################################################################
## Parsing command line options ####################################
####################################################################
# USAGE: ./gene_extra_info.pl --species "Homo sapiens" --outdir ../../appl_db/ird_v1/hsa ...
##docker run -it --mount type=bind,source=/tmp,target=/tmp opencb/cellbase-builder:6.2.0-SNAPSHOT /opt/cellbase/scripts/ensembl-scripts/gene_extra_info.pl -s "Mus musculus" -o /tmp

# USAGE: ./gene_extra_info.pl --species "Homo sapiens" --assembly "GRCh38" --outdir ../../appl_db/ird_v1/hsa ...

## Parsing command line
GetOptions ('species=s' => \$species, 'assembly=s' => \$assembly, 'outdir=s' => \$outdir, 'phylo=s' => \$phylo,
Expand Down Expand Up @@ -50,8 +52,8 @@

if ($phylo eq "" || $phylo eq "vertebrate") {
print ("In vertebrates section\n");
if ($species eq "Homo sapiens" && $assembly eq "GRCh38") {
print ("Human selected, assembly ".$assembly." selected, connecting to port ".$ENSEMBL_PORT."\n");
if ($species eq "Homo sapiens" || $species eq "Mus musculus") {
print ($species." selected, assembly ".$assembly." selected, connecting to port ".$ENSEMBL_PORT."\n");
Bio::EnsEMBL::Registry->load_registry_from_db(
-host => $ENSEMBL_HOST,
-user => $ENSEMBL_USER,
Expand Down
32 changes: 12 additions & 20 deletions cellbase-app/app/scripts/ensembl-scripts/genome_info.pl
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@
####################################################################
## Parsing command line options ####################################
####################################################################
# USAGE: ./genome_info.pl --species "Homo sapiens" --outfile ../../appl_db/ird_v1/hsa ...
##docker run -it --mount type=bind,source=/tmp,target=/tmp opencb/cellbase-builder:6.2.0-SNAPSHOT /opt/cellbase/scripts/ensembl-scripts/genome_info.pl --species "Mus musculus" --assembly GRCm39 --outfile /tmp

# USAGE: ./genome_info.pl --species "Homo sapiens" --assembly GRCh38 --outfile ../../appl_db/ird_v1/hsa ...

## Parsing command line
GetOptions ('species=s' => \$species, 'assembly=s' => \$assembly, 'o|outfile=s' => \$outfile, 'phylo=s' => \$phylo,
Expand All @@ -29,7 +31,6 @@

if ($outfile eq "") {
$outfile = "/ensembl-data/genome_info.json";
# $outfile = "/ensembl-data/$species.json";
}

####################################################################
Expand All @@ -42,17 +43,13 @@
# Bio::EnsEMBL::Registry->load_all("$ENSEMBL_REGISTRY");
if($phylo eq "" || $phylo eq "vertebrate") {
print ("In vertebrates section\n");
if ($species eq "Homo sapiens" && $assembly eq "GRCh38") {
print ("Human selected, assembly ".$assembly." selected, connecting to port ".$ENSEMBL_PORT."\n");
Bio::EnsEMBL::Registry->load_registry_from_db(
-host => $ENSEMBL_HOST,
-user => $ENSEMBL_USER,
-port => $ENSEMBL_PORT,
-verbose => $verbose
);
} else {
print ("Human selected, assembly ".$assembly." no supported\n");
}
print ("Species: ".$species.", assembly ".$assembly.", connecting to: ".$ENSEMBL_HOST.":".$ENSEMBL_PORT."\n");
Bio::EnsEMBL::Registry->load_registry_from_db(
-host => $ENSEMBL_HOST,
-user => $ENSEMBL_USER,
-port => $ENSEMBL_PORT,
-verbose => $verbose
);
} else {
print ("In no-vertebrates section\n");
Bio::EnsEMBL::Registry->load_registry_from_db(
Expand All @@ -64,7 +61,6 @@

my $slice_adaptor = Bio::EnsEMBL::Registry->get_adaptor($species, "core", "Slice");
my $karyotype_adaptor = Bio::EnsEMBL::Registry->get_adaptor($species, "core", "KaryotypeBand");
# my $gene_adaptor = Bio::EnsEMBL::Registry->get_adaptor($species, "core", "Gene");
####################################################################

my %info_stats = ();
Expand All @@ -81,12 +77,10 @@
$chromosome{'start'} = int($chrom->start());
$chromosome{'end'} = int($chrom->end());
$chromosome{'size'} = int($chrom->seq_region_length());
# $chromosome{'numberGenes'} = scalar @{$chrom->get_all_Genes()};
$chromosome{'isCircular'} = $chrom->is_circular();

my @cytobands = ();
foreach my $cyto(@{$karyotype_adaptor->fetch_all_by_chr_name($chrom->seq_region_name)}) {
# print $cytoband->name."\n";
my %cytoband = ();
$cytoband{'name'} = $cyto->name();
$cytoband{'start'} = int($cyto->start());
Expand All @@ -96,7 +90,7 @@
push(@cytobands, \%cytoband);
}

## check if any cytoband has been added
## Check if any cytoband has been added
## If not a unique cytoband covering all chromosome is added.
if(@cytobands == 0) {
my %cytoband = ();
Expand All @@ -110,7 +104,6 @@
$chromosome{'cytobands'} = \@cytobands;

push(@chromosomes, \%chromosome);
# push(@chrom_ids, $chrom->seq_region_name);
}
$info_stats{'chromosomes'} = \@chromosomes;

Expand All @@ -124,7 +117,6 @@
$supercontig{'start'} = int($supercon->start());
$supercontig{'end'} = int($supercon->end());
$supercontig{'size'} = int($supercon->seq_region_length());
# $supercontig{'numberGenes'} = scalar @{$supercon->get_all_Genes()};
$supercontig{'isCircular'} = $supercon->is_circular();

## Adding an unique cytoband covering all chromosome is added.
Expand All @@ -151,7 +143,7 @@

sub print_parameters {
print "Parameters: ";
print "species: $species, outfile: $outfile, ";
print "species: $species, assembly: $assembly, outfile: $outfile, ";
print "ensembl-registry: $ENSEMBL_REGISTRY, ";
print "ensembl-host: $ENSEMBL_HOST, ensembl-port: $ENSEMBL_PORT, ";
print "ensembl-user: $ENSEMBL_USER, verbose: $verbose, help: $help";
Expand Down
19 changes: 19 additions & 0 deletions cellbase-app/app/scripts/ensembl-scripts/martURLLocation.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<!--
~ Copyright 2015-2020 OpenCB
~
~ Licensed under the Apache License, Version 2.0 (the "License");
~ you may not use this file except in compliance with the License.
~ You may obtain a copy of the License at
~
~ http://www.apache.org/licenses/LICENSE-2.0
~
~ Unless required by applicable law or agreed to in writing, software
~ distributed under the License is distributed on an "AS IS" BASIS,
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
~ See the License for the specific language governing permissions and
~ limitations under the License.
-->

<MartRegistry>
<MartURLLocation database="ensembl_mart_111" default="1" displayName="Ensembl Genes 111" host="www.ensembl.org" includeDatasets="" martUser="" name="ENSEMBL_MART_ENSEMBL" path="/biomart/martservice" port="80" serverVirtualSchema="default" visible="1" />
</MartRegistry>
Loading